Re: deleting data automatically

2015-07-27 Thread Yuheng Du
Thank you!

On Mon, Jul 27, 2015 at 1:43 PM, Ewen Cheslack-Postava e...@confluent.io
wrote:

 As I mentioned, adjusting any settings such that files are small enough
 that you don't get the benefits of append-only writes or file
 creation/deletion become a bottleneck might affect performance. It looks
 like the default setting for log.segment.bytes is 1GB, so given fast enough
 cleanup of old logs, you may not need to adjust that setting -- assuming
 you have a reasonable amount of storage, you'll easily fit many dozen log
 files of that size.

 -Ewen

 On Mon, Jul 27, 2015 at 10:36 AM, Yuheng Du yuheng.du.h...@gmail.com
 wrote:

  Thank you! what performance impacts will it be if I change
  log.segment.bytes? Thanks.
 
  On Mon, Jul 27, 2015 at 1:25 PM, Ewen Cheslack-Postava 
 e...@confluent.io
  wrote:
 
   I think log.cleanup.interval.mins was removed in the first 0.8 release.
  It
   sounds like you're looking at outdated docs. Search for
   log.retention.check.interval.ms here:
   http://kafka.apache.org/documentation.html
  
   As for setting the values too low hurting performance, I'd guess it's
   probably only an issue if you set them extremely small, such that file
   creation and cleanup become a bottleneck.
  
   -Ewen
  
   On Mon, Jul 27, 2015 at 10:03 AM, Yuheng Du yuheng.du.h...@gmail.com
   wrote:
  
If I want to get higher throughput, should I increase the
log.segment.bytes?
   
I don't see log.retention.check.interval.ms, but there is
log.cleanup.interval.mins, is that what you mean?
   
If I set log.roll.ms or log.cleanup.interval.mins too small, will it
   hurt
the throughput? Thanks.
   
On Fri, Jul 24, 2015 at 11:03 PM, Ewen Cheslack-Postava 
   e...@confluent.io

wrote:
   
 You'll want to set the log retention policy via
 log.retention.{ms,minutes,hours} or log.retention.bytes. If you
 want
really
 aggressive collection (e.g., on the order of seconds, as you
   specified),
 you might also need to adjust log.segment.bytes/log.roll.{ms,hours}
  and
 log.retention.check.interval.ms.

 On Fri, Jul 24, 2015 at 12:49 PM, Yuheng Du 
  yuheng.du.h...@gmail.com
 wrote:

  Hi,
 
  I am testing the kafka producer performance. So I created a queue
  and
  writes a large amount of data to that queue.
 
  Is there a way to delete the data automatically after some time,
  say
  whenever the data size reaches 50GB or the retention time exceeds
  10
  seconds, it will be deleted so my disk won't get filled and new
  data
 can't
  be written in?
 
  Thanks.!
 



 --
 Thanks,
 Ewen

   
  
  
  
   --
   Thanks,
   Ewen
  
 



 --
 Thanks,
 Ewen



Re: deleting data automatically

2015-07-27 Thread Yuheng Du
If I want to get higher throughput, should I increase the
log.segment.bytes?

I don't see log.retention.check.interval.ms, but there is
log.cleanup.interval.mins, is that what you mean?

If I set log.roll.ms or log.cleanup.interval.mins too small, will it hurt
the throughput? Thanks.

On Fri, Jul 24, 2015 at 11:03 PM, Ewen Cheslack-Postava e...@confluent.io
wrote:

 You'll want to set the log retention policy via
 log.retention.{ms,minutes,hours} or log.retention.bytes. If you want really
 aggressive collection (e.g., on the order of seconds, as you specified),
 you might also need to adjust log.segment.bytes/log.roll.{ms,hours} and
 log.retention.check.interval.ms.

 On Fri, Jul 24, 2015 at 12:49 PM, Yuheng Du yuheng.du.h...@gmail.com
 wrote:

  Hi,
 
  I am testing the kafka producer performance. So I created a queue and
  writes a large amount of data to that queue.
 
  Is there a way to delete the data automatically after some time, say
  whenever the data size reaches 50GB or the retention time exceeds 10
  seconds, it will be deleted so my disk won't get filled and new data
 can't
  be written in?
 
  Thanks.!
 



 --
 Thanks,
 Ewen



Re: deleting data automatically

2015-07-27 Thread Ewen Cheslack-Postava
As I mentioned, adjusting any settings such that files are small enough
that you don't get the benefits of append-only writes or file
creation/deletion become a bottleneck might affect performance. It looks
like the default setting for log.segment.bytes is 1GB, so given fast enough
cleanup of old logs, you may not need to adjust that setting -- assuming
you have a reasonable amount of storage, you'll easily fit many dozen log
files of that size.

-Ewen

On Mon, Jul 27, 2015 at 10:36 AM, Yuheng Du yuheng.du.h...@gmail.com
wrote:

 Thank you! what performance impacts will it be if I change
 log.segment.bytes? Thanks.

 On Mon, Jul 27, 2015 at 1:25 PM, Ewen Cheslack-Postava e...@confluent.io
 wrote:

  I think log.cleanup.interval.mins was removed in the first 0.8 release.
 It
  sounds like you're looking at outdated docs. Search for
  log.retention.check.interval.ms here:
  http://kafka.apache.org/documentation.html
 
  As for setting the values too low hurting performance, I'd guess it's
  probably only an issue if you set them extremely small, such that file
  creation and cleanup become a bottleneck.
 
  -Ewen
 
  On Mon, Jul 27, 2015 at 10:03 AM, Yuheng Du yuheng.du.h...@gmail.com
  wrote:
 
   If I want to get higher throughput, should I increase the
   log.segment.bytes?
  
   I don't see log.retention.check.interval.ms, but there is
   log.cleanup.interval.mins, is that what you mean?
  
   If I set log.roll.ms or log.cleanup.interval.mins too small, will it
  hurt
   the throughput? Thanks.
  
   On Fri, Jul 24, 2015 at 11:03 PM, Ewen Cheslack-Postava 
  e...@confluent.io
   
   wrote:
  
You'll want to set the log retention policy via
log.retention.{ms,minutes,hours} or log.retention.bytes. If you want
   really
aggressive collection (e.g., on the order of seconds, as you
  specified),
you might also need to adjust log.segment.bytes/log.roll.{ms,hours}
 and
log.retention.check.interval.ms.
   
On Fri, Jul 24, 2015 at 12:49 PM, Yuheng Du 
 yuheng.du.h...@gmail.com
wrote:
   
 Hi,

 I am testing the kafka producer performance. So I created a queue
 and
 writes a large amount of data to that queue.

 Is there a way to delete the data automatically after some time,
 say
 whenever the data size reaches 50GB or the retention time exceeds
 10
 seconds, it will be deleted so my disk won't get filled and new
 data
can't
 be written in?

 Thanks.!

   
   
   
--
Thanks,
Ewen
   
  
 
 
 
  --
  Thanks,
  Ewen
 




-- 
Thanks,
Ewen


Re: deleting data automatically

2015-07-27 Thread Yuheng Du
Thank you! what performance impacts will it be if I change
log.segment.bytes? Thanks.

On Mon, Jul 27, 2015 at 1:25 PM, Ewen Cheslack-Postava e...@confluent.io
wrote:

 I think log.cleanup.interval.mins was removed in the first 0.8 release. It
 sounds like you're looking at outdated docs. Search for
 log.retention.check.interval.ms here:
 http://kafka.apache.org/documentation.html

 As for setting the values too low hurting performance, I'd guess it's
 probably only an issue if you set them extremely small, such that file
 creation and cleanup become a bottleneck.

 -Ewen

 On Mon, Jul 27, 2015 at 10:03 AM, Yuheng Du yuheng.du.h...@gmail.com
 wrote:

  If I want to get higher throughput, should I increase the
  log.segment.bytes?
 
  I don't see log.retention.check.interval.ms, but there is
  log.cleanup.interval.mins, is that what you mean?
 
  If I set log.roll.ms or log.cleanup.interval.mins too small, will it
 hurt
  the throughput? Thanks.
 
  On Fri, Jul 24, 2015 at 11:03 PM, Ewen Cheslack-Postava 
 e...@confluent.io
  
  wrote:
 
   You'll want to set the log retention policy via
   log.retention.{ms,minutes,hours} or log.retention.bytes. If you want
  really
   aggressive collection (e.g., on the order of seconds, as you
 specified),
   you might also need to adjust log.segment.bytes/log.roll.{ms,hours} and
   log.retention.check.interval.ms.
  
   On Fri, Jul 24, 2015 at 12:49 PM, Yuheng Du yuheng.du.h...@gmail.com
   wrote:
  
Hi,
   
I am testing the kafka producer performance. So I created a queue and
writes a large amount of data to that queue.
   
Is there a way to delete the data automatically after some time, say
whenever the data size reaches 50GB or the retention time exceeds 10
seconds, it will be deleted so my disk won't get filled and new data
   can't
be written in?
   
Thanks.!
   
  
  
  
   --
   Thanks,
   Ewen
  
 



 --
 Thanks,
 Ewen



Re: deleting data automatically

2015-07-27 Thread Ewen Cheslack-Postava
I think log.cleanup.interval.mins was removed in the first 0.8 release. It
sounds like you're looking at outdated docs. Search for
log.retention.check.interval.ms here:
http://kafka.apache.org/documentation.html

As for setting the values too low hurting performance, I'd guess it's
probably only an issue if you set them extremely small, such that file
creation and cleanup become a bottleneck.

-Ewen

On Mon, Jul 27, 2015 at 10:03 AM, Yuheng Du yuheng.du.h...@gmail.com
wrote:

 If I want to get higher throughput, should I increase the
 log.segment.bytes?

 I don't see log.retention.check.interval.ms, but there is
 log.cleanup.interval.mins, is that what you mean?

 If I set log.roll.ms or log.cleanup.interval.mins too small, will it hurt
 the throughput? Thanks.

 On Fri, Jul 24, 2015 at 11:03 PM, Ewen Cheslack-Postava e...@confluent.io
 
 wrote:

  You'll want to set the log retention policy via
  log.retention.{ms,minutes,hours} or log.retention.bytes. If you want
 really
  aggressive collection (e.g., on the order of seconds, as you specified),
  you might also need to adjust log.segment.bytes/log.roll.{ms,hours} and
  log.retention.check.interval.ms.
 
  On Fri, Jul 24, 2015 at 12:49 PM, Yuheng Du yuheng.du.h...@gmail.com
  wrote:
 
   Hi,
  
   I am testing the kafka producer performance. So I created a queue and
   writes a large amount of data to that queue.
  
   Is there a way to delete the data automatically after some time, say
   whenever the data size reaches 50GB or the retention time exceeds 10
   seconds, it will be deleted so my disk won't get filled and new data
  can't
   be written in?
  
   Thanks.!
  
 
 
 
  --
  Thanks,
  Ewen
 




-- 
Thanks,
Ewen


deleting data automatically

2015-07-24 Thread Yuheng Du
Hi,

I am testing the kafka producer performance. So I created a queue and
writes a large amount of data to that queue.

Is there a way to delete the data automatically after some time, say
whenever the data size reaches 50GB or the retention time exceeds 10
seconds, it will be deleted so my disk won't get filled and new data can't
be written in?

Thanks.!


Re: deleting data automatically

2015-07-24 Thread gharatmayuresh15
You can configure that in the Configs by setting log retention :

http://kafka.apache.org/07/configuration.html

Thanks,

Mayuresh

Sent from my iPhone

 On Jul 24, 2015, at 12:49 PM, Yuheng Du yuheng.du.h...@gmail.com wrote:
 
 Hi,
 
 I am testing the kafka producer performance. So I created a queue and
 writes a large amount of data to that queue.
 
 Is there a way to delete the data automatically after some time, say
 whenever the data size reaches 50GB or the retention time exceeds 10
 seconds, it will be deleted so my disk won't get filled and new data can't
 be written in?
 
 Thanks.!


Re: deleting data automatically

2015-07-24 Thread Ewen Cheslack-Postava
You'll want to set the log retention policy via
log.retention.{ms,minutes,hours} or log.retention.bytes. If you want really
aggressive collection (e.g., on the order of seconds, as you specified),
you might also need to adjust log.segment.bytes/log.roll.{ms,hours} and
log.retention.check.interval.ms.

On Fri, Jul 24, 2015 at 12:49 PM, Yuheng Du yuheng.du.h...@gmail.com
wrote:

 Hi,

 I am testing the kafka producer performance. So I created a queue and
 writes a large amount of data to that queue.

 Is there a way to delete the data automatically after some time, say
 whenever the data size reaches 50GB or the retention time exceeds 10
 seconds, it will be deleted so my disk won't get filled and new data can't
 be written in?

 Thanks.!




-- 
Thanks,
Ewen