[jira] [Updated] (FLINK-1725) New Partitioner for better load balancing for skewed data

2019-10-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-1725:
--
Labels: LoadBalancing Partitioner pull-request-available  (was: 
LoadBalancing Partitioner)

> New Partitioner for better load balancing for skewed data
> -
>
> Key: FLINK-1725
> URL: https://issues.apache.org/jira/browse/FLINK-1725
> Project: Flink
>  Issue Type: Improvement
>  Components: API / DataStream
>Affects Versions: 0.8.1
>Reporter: Anis Nasir
>Assignee: Anis Nasir
>Priority: Major
>  Labels: LoadBalancing, Partitioner, pull-request-available
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> Hi,
> We have recently studied the problem of load balancing in Storm [1].
> In particular, we focused on key distribution of the stream for skewed data.
> We developed a new stream partitioning scheme (which we call Partial Key 
> Grouping). It achieves better load balancing than key grouping while being 
> more scalable than shuffle grouping in terms of memory.
> In the paper we show a number of mining algorithms that are easy to implement 
> with partial key grouping, and whose performance can benefit from it. We 
> think that it might also be useful for a larger class of algorithms.
> Partial key grouping is very easy to implement: it requires just a few lines 
> of code in Java when implemented as a custom grouping in Storm [2].
> For all these reasons, we believe it will be a nice addition to the standard 
> Partitioners available in Flink. If the community thinks it's a good idea, we 
> will be happy to offer support in the porting.
> References:
> [1]. 
> https://melmeric.files.wordpress.com/2014/11/the-power-of-both-choices-practical-load-balancing-for-distributed-stream-processing-engines.pdf
> [2]. https://github.com/gdfm/partial-key-grouping



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-1725) New Partitioner for better load balancing for skewed data

2017-02-16 Thread Anis Nasir (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anis Nasir updated FLINK-1725:
--

I guess the idea is to avoid shuffle grouping. For instance, if you have a
cluster of 1000 machines, you might not want to broadcast message to all
and reduce the key from all the workers. This scheme allows you to only
split on two workers and still get the similar performance. The same
intuition will go for memory requirement using shuffle grouping as well.

Cheers,
Anis



On Thu, Feb 16, 2017 at 8:11 PM, Aljoscha Krettek (JIRA) 



> New Partitioner for better load balancing for skewed data
> -
>
> Key: FLINK-1725
> URL: https://issues.apache.org/jira/browse/FLINK-1725
> Project: Flink
>  Issue Type: Improvement
>  Components: DataStream API
>Affects Versions: 0.8.1
>Reporter: Anis Nasir
>Assignee: Anis Nasir
>  Labels: LoadBalancing, Partitioner
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> Hi,
> We have recently studied the problem of load balancing in Storm [1].
> In particular, we focused on key distribution of the stream for skewed data.
> We developed a new stream partitioning scheme (which we call Partial Key 
> Grouping). It achieves better load balancing than key grouping while being 
> more scalable than shuffle grouping in terms of memory.
> In the paper we show a number of mining algorithms that are easy to implement 
> with partial key grouping, and whose performance can benefit from it. We 
> think that it might also be useful for a larger class of algorithms.
> Partial key grouping is very easy to implement: it requires just a few lines 
> of code in Java when implemented as a custom grouping in Storm [2].
> For all these reasons, we believe it will be a nice addition to the standard 
> Partitioners available in Flink. If the community thinks it's a good idea, we 
> will be happy to offer support in the porting.
> References:
> [1]. 
> https://melmeric.files.wordpress.com/2014/11/the-power-of-both-choices-practical-load-balancing-for-distributed-stream-processing-engines.pdf
> [2]. https://github.com/gdfm/partial-key-grouping



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (FLINK-1725) New Partitioner for better load balancing for skewed data

2017-02-16 Thread Aljoscha Krettek (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek updated FLINK-1725:

Component/s: (was: Local Runtime)
 DataStream API

> New Partitioner for better load balancing for skewed data
> -
>
> Key: FLINK-1725
> URL: https://issues.apache.org/jira/browse/FLINK-1725
> Project: Flink
>  Issue Type: Improvement
>  Components: DataStream API
>Affects Versions: 0.8.1
>Reporter: Anis Nasir
>Assignee: Anis Nasir
>  Labels: LoadBalancing, Partitioner
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> Hi,
> We have recently studied the problem of load balancing in Storm [1].
> In particular, we focused on key distribution of the stream for skewed data.
> We developed a new stream partitioning scheme (which we call Partial Key 
> Grouping). It achieves better load balancing than key grouping while being 
> more scalable than shuffle grouping in terms of memory.
> In the paper we show a number of mining algorithms that are easy to implement 
> with partial key grouping, and whose performance can benefit from it. We 
> think that it might also be useful for a larger class of algorithms.
> Partial key grouping is very easy to implement: it requires just a few lines 
> of code in Java when implemented as a custom grouping in Storm [2].
> For all these reasons, we believe it will be a nice addition to the standard 
> Partitioners available in Flink. If the community thinks it's a good idea, we 
> will be happy to offer support in the porting.
> References:
> [1]. 
> https://melmeric.files.wordpress.com/2014/11/the-power-of-both-choices-practical-load-balancing-for-distributed-stream-processing-engines.pdf
> [2]. https://github.com/gdfm/partial-key-grouping



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (FLINK-1725) New Partitioner for better load balancing for skewed data

2016-07-04 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi updated FLINK-1725:
---
Component/s: (was: New Components)
 Local Runtime

> New Partitioner for better load balancing for skewed data
> -
>
> Key: FLINK-1725
> URL: https://issues.apache.org/jira/browse/FLINK-1725
> Project: Flink
>  Issue Type: Improvement
>  Components: Local Runtime
>Affects Versions: 0.8.1
>Reporter: Anis Nasir
>Assignee: Anis Nasir
>  Labels: LoadBalancing, Partitioner
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> Hi,
> We have recently studied the problem of load balancing in Storm [1].
> In particular, we focused on key distribution of the stream for skewed data.
> We developed a new stream partitioning scheme (which we call Partial Key 
> Grouping). It achieves better load balancing than key grouping while being 
> more scalable than shuffle grouping in terms of memory.
> In the paper we show a number of mining algorithms that are easy to implement 
> with partial key grouping, and whose performance can benefit from it. We 
> think that it might also be useful for a larger class of algorithms.
> Partial key grouping is very easy to implement: it requires just a few lines 
> of code in Java when implemented as a custom grouping in Storm [2].
> For all these reasons, we believe it will be a nice addition to the standard 
> Partitioners available in Flink. If the community thinks it's a good idea, we 
> will be happy to offer support in the porting.
> References:
> [1]. 
> https://melmeric.files.wordpress.com/2014/11/the-power-of-both-choices-practical-load-balancing-for-distributed-stream-processing-engines.pdf
> [2]. https://github.com/gdfm/partial-key-grouping



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-1725) New Partitioner for better load balancing for skewed data

2015-03-23 Thread Robert Metzger (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger updated FLINK-1725:
--
Assignee: Anis Nasir

> New Partitioner for better load balancing for skewed data
> -
>
> Key: FLINK-1725
> URL: https://issues.apache.org/jira/browse/FLINK-1725
> Project: Flink
>  Issue Type: Improvement
>  Components: New Components
>Affects Versions: 0.8.1
>Reporter: Anis Nasir
>Assignee: Anis Nasir
>  Labels: LoadBalancing, Partitioner
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> Hi,
> We have recently studied the problem of load balancing in Storm [1].
> In particular, we focused on key distribution of the stream for skewed data.
> We developed a new stream partitioning scheme (which we call Partial Key 
> Grouping). It achieves better load balancing than key grouping while being 
> more scalable than shuffle grouping in terms of memory.
> In the paper we show a number of mining algorithms that are easy to implement 
> with partial key grouping, and whose performance can benefit from it. We 
> think that it might also be useful for a larger class of algorithms.
> Partial key grouping is very easy to implement: it requires just a few lines 
> of code in Java when implemented as a custom grouping in Storm [2].
> For all these reasons, we believe it will be a nice addition to the standard 
> Partitioners available in Flink. If the community thinks it's a good idea, we 
> will be happy to offer support in the porting.
> References:
> [1]. 
> https://melmeric.files.wordpress.com/2014/11/the-power-of-both-choices-practical-load-balancing-for-distributed-stream-processing-engines.pdf
> [2]. https://github.com/gdfm/partial-key-grouping



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-1725) New Partitioner for better load balancing for skewed data

2015-03-18 Thread Anis Nasir (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anis Nasir updated FLINK-1725:
--
Remaining Estimate: 336h  (was: 1h)
 Original Estimate: 336h  (was: 1h)

> New Partitioner for better load balancing for skewed data
> -
>
> Key: FLINK-1725
> URL: https://issues.apache.org/jira/browse/FLINK-1725
> Project: Flink
>  Issue Type: Improvement
>  Components: New Components
>Affects Versions: 0.8.1
>Reporter: Anis Nasir
>  Labels: LoadBalancing, Partitioner
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> Hi,
> We have recently studied the problem of load balancing in Storm [1].
> In particular, we focused on key distribution of the stream for skewed data.
> We developed a new stream partitioning scheme (which we call Partial Key 
> Grouping). It achieves better load balancing than key grouping while being 
> more scalable than shuffle grouping in terms of memory.
> In the paper we show a number of mining algorithms that are easy to implement 
> with partial key grouping, and whose performance can benefit from it. We 
> think that it might also be useful for a larger class of algorithms.
> Partial key grouping is very easy to implement: it requires just a few lines 
> of code in Java when implemented as a custom grouping in Storm [2].
> For all these reasons, we believe it will be a nice addition to the standard 
> Partitioners available in Flink. If the community thinks it's a good idea, we 
> will be happy to offer support in the porting.
> References:
> [1]. 
> https://melmeric.files.wordpress.com/2014/11/the-power-of-both-choices-practical-load-balancing-for-distributed-stream-processing-engines.pdf
> [2]. https://github.com/gdfm/partial-key-grouping



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-1725) New Partitioner for better load balancing for skewed data

2015-03-18 Thread Anis Nasir (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anis Nasir updated FLINK-1725:
--
Description: 
Hi,

We have recently studied the problem of load balancing in Storm [1].
In particular, we focused on key distribution of the stream for skewed data.
We developed a new stream partitioning scheme (which we call Partial Key 
Grouping). It achieves better load balancing than key grouping while being more 
scalable than shuffle grouping in terms of memory.

In the paper we show a number of mining algorithms that are easy to implement 
with partial key grouping, and whose performance can benefit from it. We think 
that it might also be useful for a larger class of algorithms.

Partial key grouping is very easy to implement: it requires just a few lines of 
code in Java when implemented as a custom grouping in Storm [2].

For all these reasons, we believe it will be a nice addition to the standard 
Partitioners available in Flink. If the community thinks it's a good idea, we 
will be happy to offer support in the porting.

References:
[1]. 
https://melmeric.files.wordpress.com/2014/11/the-power-of-both-choices-practical-load-balancing-for-distributed-stream-processing-engines.pdf
[2]. https://github.com/gdfm/partial-key-grouping

  was:
Hi,

We have recently studied the problem of load balancing in Storm [1].
In particular, we focused on key distribution of the stream for skewed skewede 
data.
We developed a new stream partitioning scheme (which we call Partial Key 
Grouping). It achieves better load balancing than key grouping while being more 
scalable than shuffle grouping in terms of memory.

In the paper we show a number of mining algorithms that are easy to implement 
with partial key grouping, and whose performance can benefit from it. We think 
that it might also be useful for a larger class of algorithms.

Partial key grouping is very easy to implement: it requires just a few lines of 
code in Java when implemented as a custom grouping in Storm [2].

For all these reasons, we believe it will be a nice addition to the standard 
Partitioners available in Flink. If the community thinks it's a good idea, we 
will be happy to offer support in the porting.

References:
[1]. 
https://melmeric.files.wordpress.com/2014/11/the-power-of-both-choices-practical-load-balancing-for-distributed-stream-processing-engines.pdf
[2]. https://github.com/gdfm/partial-key-grouping


> New Partitioner for better load balancing for skewed data
> -
>
> Key: FLINK-1725
> URL: https://issues.apache.org/jira/browse/FLINK-1725
> Project: Flink
>  Issue Type: Improvement
>  Components: New Components
>Affects Versions: 0.8.1
>Reporter: Anis Nasir
>  Labels: LoadBalancing, Partitioner
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> Hi,
> We have recently studied the problem of load balancing in Storm [1].
> In particular, we focused on key distribution of the stream for skewed data.
> We developed a new stream partitioning scheme (which we call Partial Key 
> Grouping). It achieves better load balancing than key grouping while being 
> more scalable than shuffle grouping in terms of memory.
> In the paper we show a number of mining algorithms that are easy to implement 
> with partial key grouping, and whose performance can benefit from it. We 
> think that it might also be useful for a larger class of algorithms.
> Partial key grouping is very easy to implement: it requires just a few lines 
> of code in Java when implemented as a custom grouping in Storm [2].
> For all these reasons, we believe it will be a nice addition to the standard 
> Partitioners available in Flink. If the community thinks it's a good idea, we 
> will be happy to offer support in the porting.
> References:
> [1]. 
> https://melmeric.files.wordpress.com/2014/11/the-power-of-both-choices-practical-load-balancing-for-distributed-stream-processing-engines.pdf
> [2]. https://github.com/gdfm/partial-key-grouping



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)