Hey Sriharsha,

Great, thanks!

For 4:

Yeah the use case for init and close is making use of any kind of metadata.
An example of this would be if you are trying to do range partitioning you
need to map lexicographic ranges to numeric partitions. You might do this
by adding a new property to the config such as
   partitioner.metadata=a:0, b:1, ..., z:26
Or likewise you might have a partitioner built using Java's digest
interface and the config would be the digest algorithm name.

Or you might need to maintain this dynamically and have the partitioner
fetch this list on initialization from some central store (zk or whatever).

The init method we should use is the Configurable interface, that will
automatically pass in the configuration given to the producer at
instantiation time.

I agree that these additional methods add a bit of complexity and often
aren't needed, but on the flip side it is often hard to use the interface
without them when you do need them.

5. Yeah that's how the current partitioner works, but that is just because
it is a non-public interface. It's not clear to me if the partitioner
should override the partition or not. We could either say:
a. The partitioner is the default policy and the partition field is a way
to override that on a per-record basis for cases where you need that or
where it is simpler. If this is our description then the partitioner should
only take effect if partition==null
b. The partition the user specifies is just a suggestion and the
partitioner can interpret or override that in whatever way they want.

I think (a) may actually make more sense. The reason is because otherwise
the behavior of the partition field in ProducerRecord will be very hard to
depend on as the effect it has will be totally dependent on the partitioner
that is set. Any correct partitioner will basically have to implement the
case where the partition is set and I think the only sensible thing then is
to use it as the partition (right?).

Dunno, what do you think...?

-Jay

On Thu, Apr 23, 2015 at 2:59 PM, Sriharsha Chintalapani <
harsh...@fastmail.fm> wrote:

> Hi Jay,
>          Sorry about the KIP formatting . I fixed those in the KIP.
>
> 2. We certainly need to add both the serialized and unserialized form for
> the key as both are useful.
>
> I added those to the interface.
>
> 3. Do we need to add the value? I suspect people will have uses for
> computing something off a few fields in the value to choose the partition.
> This would be useful in cases where the key was being used for log
> compaction purposes and did not contain the full information for computing
> the partition.
>
> added it as well.
>
> 4. This interface doesn't include either an init() or close() method. It
> should implement Closable and Configurable, right?
>
> I am not quite sure about having init() or close() for partitioner. Are we
> looking at partitioner using some external resources to initialize and
> close. If thats the case than init() should also take in some config as
> param, this can add more complexity.
>
>
> 5. What happens if the user both sets the partition id in the
> ProducerRecord and sets a partitioner? Does the partition id just get
> passed in to the partitioner (as sort of implied in this interface?). This
> is a bit weird since if you pass in the partition id you kind of expect it
> to get used, right? Or is it the case that if you specify a partition the
> partitioner isn't used at all (in which case no point in including
> partition in the Partitioner api).
>
> In current Producer Record partition id is getting passed to Partitioner.
> If a custom partitioner is not going to use that than thats up to their
> implementation  right. Similarly in our interface we’ve Value as another
> param this may or may not be used. Essentially its up to the Partitioner to
> disclose on what available information they are going to partition against.
>
> Thanks,
> Harsha
>
>
> On April 23, 2015 at 9:11:33 AM, Jay Kreps (jay.kr...@gmail.com) wrote:
>
> Hey Harsha,
>
> A few comments:
>
> Can you finish up the KIP there are some unfinished sentences and odd
> whitespace things going on.
>
> Here are the questions I think we should consider:
> 1. Do we need this at all given that we have the partition argument in
> ProducerRecord which gives full control? I think we do need it because
> this
> is a way to plug in a different partitioning strategy at run time and do
> it
> in a fairly transparent way.
> 2. We certainly need to add both the serialized and unserialized form for
> the key as both are useful.
> 3. Do we need to add the value? I suspect people will have uses for
> computing something off a few fields in the value to choose the partition.
> This would be useful in cases where the key was being used for log
> compaction purposes and did not contain the full information for computing
> the partition.
> 4. This interface doesn't include either an init() or close() method. It
> should implement Closable and Configurable, right?
> 5. What happens if the user both sets the partition id in the
> ProducerRecord and sets a partitioner? Does the partition id just get
> passed in to the partitioner (as sort of implied in this interface?). This
> is a bit weird since if you pass in the partition id you kind of expect it
> to get used, right? Or is it the case that if you specify a partition the
> partitioner isn't used at all (in which case no point in including
> partition in the Partitioner api).
>
> Cheers,
>
> -Jay
>
> On Thu, Apr 23, 2015 at 6:55 AM, Sriharsha Chintalapani <ka...@harsha.io>
> wrote:
>
> > Hi,
> > Here is the KIP for adding a partitioner interface for producer.
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-+22+-+Expose+a+Partitioner+interface+in+the+new+producer
> > There is one open question about how interface should look like. Please
> > take a look and let me know if you prefer one way or the other.
> >
> > Thanks,
> > Harsha
> >
> >
>
>

Reply via email to