[GitHub] flink pull request: [FLINK-2138] Added custom partitioning to Data...

2015-07-13 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/872 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] flink pull request: [FLINK-2138] Added custom partitioning to Data...

2015-07-10 Thread gyfora
Github user gyfora commented on the pull request: https://github.com/apache/flink/pull/872#issuecomment-120506018 Looks good to merge. If no objections, I will merge it tomorrow. :+1: --- If your project is set up for it, you can reply to this email and have your reply appear on

[GitHub] flink pull request: [FLINK-2138] Added custom partitioning to Data...

2015-07-01 Thread gaborhermann
Github user gaborhermann commented on the pull request: https://github.com/apache/flink/pull/872#issuecomment-117644273 Sorry. * updated the docs (custom partitioning was also missing in the Scala batch API docs) * added IT case tests (also for the other stream partitioning

[GitHub] flink pull request: [FLINK-2138] Added custom partitioning to Data...

2015-06-30 Thread gaborhermann
Github user gaborhermann commented on the pull request: https://github.com/apache/flink/pull/872#issuecomment-117141916 Okay then. These are the effects of changing I did not know about. Let's stick to (2) and later on, we might reconsider this. --- If your project is set up for it,

[GitHub] flink pull request: [FLINK-2138] Added custom partitioning to Data...

2015-06-30 Thread StephanEwen
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/872#issuecomment-117117294 How about we leave the batch API as it is for now and address that as a separate issue? There are quite some subtleties in how the optimizer assesses equality of

[GitHub] flink pull request: [FLINK-2138] Added custom partitioning to Data...

2015-06-29 Thread gaborhermann
Github user gaborhermann commented on the pull request: https://github.com/apache/flink/pull/872#issuecomment-116647912 By the way, in the Scala DataSet the user should specify the Java `Partitioner[K]` class. Wouldn't it be more convenient to wrap a function like `(K, Int) = Int`

[GitHub] flink pull request: [FLINK-2138] Added custom partitioning to Data...

2015-06-29 Thread StephanEwen
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/872#issuecomment-116649358 In the batch API, equality of the partitioners is used to determine compatibility of the partitioning. This may at some point become interesting for the streaming

[GitHub] flink pull request: [FLINK-2138] Added custom partitioning to Data...

2015-06-29 Thread gaborhermann
Github user gaborhermann commented on the pull request: https://github.com/apache/flink/pull/872#issuecomment-116671285 I'd prefer the function implementation (like `(K, Int) = Int`), but it should stay consistent with the batch API. I don't see why the wrapping would effect the

[GitHub] flink pull request: [FLINK-2138] Added custom partitioning to Data...

2015-06-29 Thread gaborhermann
Github user gaborhermann commented on the pull request: https://github.com/apache/flink/pull/872#issuecomment-116736041 Sorry for not making myself clear. I would actually go for 4. Only the Scala function (both in the streaming and batch API) I don't understand

[GitHub] flink pull request: [FLINK-2138] Added custom partitioning to Data...

2015-06-29 Thread StephanEwen
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/872#issuecomment-116710090 I am confused now, what is it going to be? 1. Overloading, such that it is Scala function and Partitioner, at the cost of redundant APIs. 2. Only

[GitHub] flink pull request: [FLINK-2138] Added custom partitioning to Data...

2015-06-29 Thread StephanEwen
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/872#issuecomment-116745707 The partitioner function in Scala was simply added as a mirror of the Java API. The batch API is stable, that means at most we can add a Scala function and

[GitHub] flink pull request: [FLINK-2138] Added custom partitioning to Data...

2015-06-29 Thread gaborhermann
Github user gaborhermann commented on the pull request: https://github.com/apache/flink/pull/872#issuecomment-116755999 Okay, then I will * deprecate the partitioner implementation in the batch API * add the function implementation to the batch API * add the function

[GitHub] flink pull request: [FLINK-2138] Added custom partitioning to Data...

2015-06-27 Thread gyfora
Github user gyfora commented on the pull request: https://github.com/apache/flink/pull/872#issuecomment-116106372 I think I could find several use cases if I wanted to :) For example I would often like to broadcast some model information to many downstream operators at once. (not

[GitHub] flink pull request: [FLINK-2138] Added custom partitioning to Data...

2015-06-27 Thread gaborhermann
Github user gaborhermann commented on the pull request: https://github.com/apache/flink/pull/872#issuecomment-116103719 I guess it is easier for the users to understand and partitioning to multiple channels at a time is rarely needed. Is there a use-case where it is needed?

[GitHub] flink pull request: [FLINK-2138] Added custom partitioning to Data...

2015-06-26 Thread gyfora
Github user gyfora commented on the pull request: https://github.com/apache/flink/pull/872#issuecomment-115832961 Wouldn't it make sense to implement custom partitioning in a way that it allows to return a array of indexes like in the ChannelSelector interface? Returning only 1 index

[GitHub] flink pull request: [FLINK-2138] Added custom partitioning to Data...

2015-06-26 Thread gaborhermann
GitHub user gaborhermann opened a pull request: https://github.com/apache/flink/pull/872 [FLINK-2138] Added custom partitioning to DataStream Custom partitioning added to DataStream in order to be more consistent with the batch API. You can merge this pull request into a Git