Re: Kafka topic naming conventions

2015-03-19 Thread Renato Marroquín Mogrovejo
There was an interesting discussion over in the kafka mailing list that
might give you more ideas Roger.
Although they don't mention anything about the number of partitions when
doing so, anyways maybe it helps.


Renato M.

[1] https://www.mail-archive.com/users@kafka.apache.org/msg11976.html

2015-03-19 5:43 GMT+01:00 Roger Hoover roger.hoo...@gmail.com:

 Thanks, guys.  I was also playing around with including partition count and
 even the partition key in the topic name.   My thought was that topics may
 have the same data and number of partitions but only differ by partition
 key.  After a while, the naming does get crazy (too long and ugly).  We
 really need a topic metatdata store.

 On Wed, Mar 18, 2015 at 6:21 PM, Chinmay Soman chinmay.cere...@gmail.com
 wrote:

  Yeah ! It does seem a bit hackish - but I think this approach promises
 less
  config/operation errors.
 
  Although I think some of these checks can be built within Samza -
 assuming
  Kafka has a metadata store in the near future - the Samza container can
  validate the #topics against this store.
 
  On Wed, Mar 18, 2015 at 6:16 PM, Chris Riccomini criccom...@apache.org
  wrote:
 
   Hey Chinmay,
  
   Cool, this is good feedback. I didn't think I was *that* crazy. :)
  
   Cheers,
   Chris
  
   On Wed, Mar 18, 2015 at 6:10 PM, Chinmay Soman 
  chinmay.cere...@gmail.com
   wrote:
  
Thats what we're doing as well - appending partition count to the
 kafka
topic name. This actually helps keep track of the #partitions for
 each
topic (since Kafka doesn't have a Metadata store yet).
   
In case of topic expansion - we actually just resort to creating a
 new
topic. Although that is an overhead - the thought process is that
 this
   will
minimize operational errors. Also, this is necessary to do in case
  we're
doing some kind of joins.
   
   
On Wed, Mar 18, 2015 at 5:59 PM, Jakob Homan jgho...@gmail.com
  wrote:
   
 On 18 March 2015 at 17:48, Chris Riccomini criccom...@apache.org
wrote:
  One thing I haven't seen, but might be relevant, is including
   partition
  counts in the topic.

 Yeah, but then if you change the partition count later on, you've
 got
 incorrect information forever. Or you need to create a new stream,
 which might be a nice forcing function to make sure your join isn't
 screwed up.  There'd need to be something somewhere to enforce that
 though.

   
   
   
--
Thanks and regards
   
Chinmay Soman
   
  
 
 
 
  --
  Thanks and regards
 
  Chinmay Soman
 



Re: Kafka topic naming conventions

2015-03-19 Thread Roger Hoover
Renato,

Thanks for the link.  Some interesting suggests there as well.

On Thu, Mar 19, 2015 at 2:28 AM, Renato Marroquín Mogrovejo 
renatoj.marroq...@gmail.com wrote:

 There was an interesting discussion over in the kafka mailing list that
 might give you more ideas Roger.
 Although they don't mention anything about the number of partitions when
 doing so, anyways maybe it helps.


 Renato M.

 [1] https://www.mail-archive.com/users@kafka.apache.org/msg11976.html

 2015-03-19 5:43 GMT+01:00 Roger Hoover roger.hoo...@gmail.com:

  Thanks, guys.  I was also playing around with including partition count
 and
  even the partition key in the topic name.   My thought was that topics
 may
  have the same data and number of partitions but only differ by partition
  key.  After a while, the naming does get crazy (too long and ugly).  We
  really need a topic metatdata store.
 
  On Wed, Mar 18, 2015 at 6:21 PM, Chinmay Soman 
 chinmay.cere...@gmail.com
  wrote:
 
   Yeah ! It does seem a bit hackish - but I think this approach promises
  less
   config/operation errors.
  
   Although I think some of these checks can be built within Samza -
  assuming
   Kafka has a metadata store in the near future - the Samza container can
   validate the #topics against this store.
  
   On Wed, Mar 18, 2015 at 6:16 PM, Chris Riccomini 
 criccom...@apache.org
   wrote:
  
Hey Chinmay,
   
Cool, this is good feedback. I didn't think I was *that* crazy. :)
   
Cheers,
Chris
   
On Wed, Mar 18, 2015 at 6:10 PM, Chinmay Soman 
   chinmay.cere...@gmail.com
wrote:
   
 Thats what we're doing as well - appending partition count to the
  kafka
 topic name. This actually helps keep track of the #partitions for
  each
 topic (since Kafka doesn't have a Metadata store yet).

 In case of topic expansion - we actually just resort to creating a
  new
 topic. Although that is an overhead - the thought process is that
  this
will
 minimize operational errors. Also, this is necessary to do in case
   we're
 doing some kind of joins.


 On Wed, Mar 18, 2015 at 5:59 PM, Jakob Homan jgho...@gmail.com
   wrote:

  On 18 March 2015 at 17:48, Chris Riccomini 
 criccom...@apache.org
 wrote:
   One thing I haven't seen, but might be relevant, is including
partition
   counts in the topic.
 
  Yeah, but then if you change the partition count later on, you've
  got
  incorrect information forever. Or you need to create a new
 stream,
  which might be a nice forcing function to make sure your join
 isn't
  screwed up.  There'd need to be something somewhere to enforce
 that
  though.
 



 --
 Thanks and regards

 Chinmay Soman

   
  
  
  
   --
   Thanks and regards
  
   Chinmay Soman
  
 



Re: Kafka topic naming conventions

2015-03-18 Thread Chris Riccomini
Hey Roger,

We haven't thought about this in great detail. People do all kinds of wacky
things in practice. We have some that are like, AdViewsByMemberId. There
are various permutations of that.

One thing I haven't seen, but might be relevant, is including partition
counts in the topic. If you're doing joins, you kind of care about both the
join key and partition count.

Sorry I don't have better guidance. :/

Cheers,
Chris

On Wed, Mar 18, 2015 at 5:23 PM, Roger Hoover roger.hoo...@gmail.com
wrote:

 Hi,

 Wondering what naming conventions people are using for topics in Kafka.
 When there's re-partitioning involved, you can end up with multiple topics
 that have the exact same data but are partitioned differently.  How do you
 name them?

 Thanks,

 Roger



Re: Kafka topic naming conventions

2015-03-18 Thread Chris Riccomini
Hey Chinmay,

Cool, this is good feedback. I didn't think I was *that* crazy. :)

Cheers,
Chris

On Wed, Mar 18, 2015 at 6:10 PM, Chinmay Soman chinmay.cere...@gmail.com
wrote:

 Thats what we're doing as well - appending partition count to the kafka
 topic name. This actually helps keep track of the #partitions for each
 topic (since Kafka doesn't have a Metadata store yet).

 In case of topic expansion - we actually just resort to creating a new
 topic. Although that is an overhead - the thought process is that this will
 minimize operational errors. Also, this is necessary to do in case we're
 doing some kind of joins.


 On Wed, Mar 18, 2015 at 5:59 PM, Jakob Homan jgho...@gmail.com wrote:

  On 18 March 2015 at 17:48, Chris Riccomini criccom...@apache.org
 wrote:
   One thing I haven't seen, but might be relevant, is including partition
   counts in the topic.
 
  Yeah, but then if you change the partition count later on, you've got
  incorrect information forever. Or you need to create a new stream,
  which might be a nice forcing function to make sure your join isn't
  screwed up.  There'd need to be something somewhere to enforce that
  though.
 



 --
 Thanks and regards

 Chinmay Soman



Re: Kafka topic naming conventions

2015-03-18 Thread Chris Riccomini
Hey Jakob,

 Yeah, but then if you change the partition count later on, you've got 
 incorrect
information forever.

You're right. But IMO this further reinforces that you *can't* change
partition counts on a topic that you're using for a JOIN. This completely
breaks the operation.

Agree that it's just best effort, and kind of hacky. Was just a thought. I
haven't seen anyone actually do this.

Cheers,
Chris

On Wed, Mar 18, 2015 at 5:59 PM, Jakob Homan jgho...@gmail.com wrote:

 On 18 March 2015 at 17:48, Chris Riccomini criccom...@apache.org wrote:
  One thing I haven't seen, but might be relevant, is including partition
  counts in the topic.

 Yeah, but then if you change the partition count later on, you've got
 incorrect information forever. Or you need to create a new stream,
 which might be a nice forcing function to make sure your join isn't
 screwed up.  There'd need to be something somewhere to enforce that
 though.



Re: Kafka topic naming conventions

2015-03-18 Thread Chinmay Soman
Thats what we're doing as well - appending partition count to the kafka
topic name. This actually helps keep track of the #partitions for each
topic (since Kafka doesn't have a Metadata store yet).

In case of topic expansion - we actually just resort to creating a new
topic. Although that is an overhead - the thought process is that this will
minimize operational errors. Also, this is necessary to do in case we're
doing some kind of joins.


On Wed, Mar 18, 2015 at 5:59 PM, Jakob Homan jgho...@gmail.com wrote:

 On 18 March 2015 at 17:48, Chris Riccomini criccom...@apache.org wrote:
  One thing I haven't seen, but might be relevant, is including partition
  counts in the topic.

 Yeah, but then if you change the partition count later on, you've got
 incorrect information forever. Or you need to create a new stream,
 which might be a nice forcing function to make sure your join isn't
 screwed up.  There'd need to be something somewhere to enforce that
 though.




-- 
Thanks and regards

Chinmay Soman


Kafka topic naming conventions

2015-03-18 Thread Roger Hoover
Hi,

Wondering what naming conventions people are using for topics in Kafka.
When there's re-partitioning involved, you can end up with multiple topics
that have the exact same data but are partitioned differently.  How do you
name them?

Thanks,

Roger


Re: Kafka topic naming conventions

2015-03-18 Thread Chinmay Soman
Yeah ! It does seem a bit hackish - but I think this approach promises less
config/operation errors.

Although I think some of these checks can be built within Samza - assuming
Kafka has a metadata store in the near future - the Samza container can
validate the #topics against this store.

On Wed, Mar 18, 2015 at 6:16 PM, Chris Riccomini criccom...@apache.org
wrote:

 Hey Chinmay,

 Cool, this is good feedback. I didn't think I was *that* crazy. :)

 Cheers,
 Chris

 On Wed, Mar 18, 2015 at 6:10 PM, Chinmay Soman chinmay.cere...@gmail.com
 wrote:

  Thats what we're doing as well - appending partition count to the kafka
  topic name. This actually helps keep track of the #partitions for each
  topic (since Kafka doesn't have a Metadata store yet).
 
  In case of topic expansion - we actually just resort to creating a new
  topic. Although that is an overhead - the thought process is that this
 will
  minimize operational errors. Also, this is necessary to do in case we're
  doing some kind of joins.
 
 
  On Wed, Mar 18, 2015 at 5:59 PM, Jakob Homan jgho...@gmail.com wrote:
 
   On 18 March 2015 at 17:48, Chris Riccomini criccom...@apache.org
  wrote:
One thing I haven't seen, but might be relevant, is including
 partition
counts in the topic.
  
   Yeah, but then if you change the partition count later on, you've got
   incorrect information forever. Or you need to create a new stream,
   which might be a nice forcing function to make sure your join isn't
   screwed up.  There'd need to be something somewhere to enforce that
   though.
  
 
 
 
  --
  Thanks and regards
 
  Chinmay Soman
 




-- 
Thanks and regards

Chinmay Soman


Re: Kafka topic naming conventions

2015-03-18 Thread Roger Hoover
Thanks, guys.  I was also playing around with including partition count and
even the partition key in the topic name.   My thought was that topics may
have the same data and number of partitions but only differ by partition
key.  After a while, the naming does get crazy (too long and ugly).  We
really need a topic metatdata store.

On Wed, Mar 18, 2015 at 6:21 PM, Chinmay Soman chinmay.cere...@gmail.com
wrote:

 Yeah ! It does seem a bit hackish - but I think this approach promises less
 config/operation errors.

 Although I think some of these checks can be built within Samza - assuming
 Kafka has a metadata store in the near future - the Samza container can
 validate the #topics against this store.

 On Wed, Mar 18, 2015 at 6:16 PM, Chris Riccomini criccom...@apache.org
 wrote:

  Hey Chinmay,
 
  Cool, this is good feedback. I didn't think I was *that* crazy. :)
 
  Cheers,
  Chris
 
  On Wed, Mar 18, 2015 at 6:10 PM, Chinmay Soman 
 chinmay.cere...@gmail.com
  wrote:
 
   Thats what we're doing as well - appending partition count to the kafka
   topic name. This actually helps keep track of the #partitions for each
   topic (since Kafka doesn't have a Metadata store yet).
  
   In case of topic expansion - we actually just resort to creating a new
   topic. Although that is an overhead - the thought process is that this
  will
   minimize operational errors. Also, this is necessary to do in case
 we're
   doing some kind of joins.
  
  
   On Wed, Mar 18, 2015 at 5:59 PM, Jakob Homan jgho...@gmail.com
 wrote:
  
On 18 March 2015 at 17:48, Chris Riccomini criccom...@apache.org
   wrote:
 One thing I haven't seen, but might be relevant, is including
  partition
 counts in the topic.
   
Yeah, but then if you change the partition count later on, you've got
incorrect information forever. Or you need to create a new stream,
which might be a nice forcing function to make sure your join isn't
screwed up.  There'd need to be something somewhere to enforce that
though.
   
  
  
  
   --
   Thanks and regards
  
   Chinmay Soman
  
 



 --
 Thanks and regards

 Chinmay Soman