https://github.com/apache/kafka/blob/0.8.2/core/src/main/scala/kafka/consumer/ConsumerConfig.scala#L101 suggests that 'consumer.id' should only be set explicitly for testing purposes. Is there a reason that it would be a bad idea to set it ourselves for production use?
The reason I am asking is that it seems like the standard value, which starts with the hostname, produces somewhat sub-optimal distribution of partitions under the lexicographical sort. If the number of partitions is not an exact multiple of the number of consumers, the surplus or deficit tends to be concentrated on just one or two machines. We'd much rather if the extra partitions were evenly striped across our cluster. (Also, in addition to the above concern, we'd also find it useful in debugging situations if we included some application-specific values in the consumer ID beyond just hostname.) Do other people run into this? Are there problems with setting the consumer.id in order to affect the distribution of partitions? Thanks, -kevin