Re: [DISCUSS] New partitioning for better load balancing

2015-04-07 Thread Gianmarco De Francisci Morales
Hi Guozhang, Thanks for your comments. 1) Yes, ordering cannot be guaranteed in PKG. In general, algorithms that use PGK should compute commutative and associative functions of the input. If you need strict ordering (i.e., the function is not commutative) within a partition, use KG. 2) I am not

Re: [DISCUSS] New partitioning for better load balancing

2015-04-07 Thread Guozhang Wang
I see, thanks for the clarification. Guozhang On Tue, Apr 7, 2015 at 1:50 AM, Gianmarco De Francisci Morales g...@apache.org wrote: Hi Guozhang, Thanks for your comments. 1) Yes, ordering cannot be guaranteed in PKG. In general, algorithms that use PGK should compute commutative and

Re: [DISCUSS] New partitioning for better load balancing

2015-04-06 Thread Guozhang Wang
Gianmarco, I browse through your paper (congrats for the ICDE publication BTW!), and here are some questions / comments on the algorithm: 1. One motivation of enabling key-based partitioned in Kafka is to achieve per-key ordering, i.e. with all messages with the same key sent to the same

Re: [DISCUSS] New partitioning for better load balancing

2015-04-05 Thread Gianmarco De Francisci Morales
Hi Jay, Thanks, that sounds a necessary step. I guess I expected something like that to be already there, at least internally. I created KAFKA-2092 to track the PKG integration. Cheers, -- Gianmarco On 4 April 2015 at 23:50, Jay Kreps jay.kr...@gmail.com wrote: Hey guys, I think the first

Re: [DISCUSS] New partitioning for better load balancing

2015-04-04 Thread Jay Kreps
Hey guys, I think the first step here would be to expose a partitioner interface for the new producer that would make it easy to plug in these different strategies. I filed a JIRA for this: https://issues.apache.org/jira/browse/KAFKA-2091 -Jay On Fri, Apr 3, 2015 at 9:36 AM, Harsha

[DISCUSS] New partitioning for better load balancing

2015-04-03 Thread Gianmarco De Francisci Morales
Hi, We have recently studied the problem of load balancing in distributed stream processing systems such as Samza [1]. In particular, we focused on what happens when the key distribution of the stream is skewed when using key grouping. We developed a new stream partitioning scheme (which we call

Re: [DISCUSS] New partitioning for better load balancing

2015-04-03 Thread Harsha
Gianmarco,                  I am coming from storm community. I think PKG is a very interesting and we can provide an implementation of Partitioner for PKG. Can you open a JIRA for this. --  Harsha Sent with Airmail On April 3, 2015 at 4:49:15 AM, Gianmarco De Francisci Morales