[GitHub] flink pull request: [FLINK-2590] fixing DataSetUtils.zipWithUnique...

2015-09-02 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/1075 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] flink pull request: [FLINK-2590] fixing DataSetUtils.zipWithUnique...

2015-09-01 Thread tillrohrmann
Github user tillrohrmann commented on the pull request: https://github.com/apache/flink/pull/1075#issuecomment-136619441 @s1ck, it's important to note that `1` will be subtracted from `getRuntimeContext().getNumberOfParallelSubtasks()` and not `getBitSize()`. The reason is that we

[GitHub] flink pull request: [FLINK-2590] fixing DataSetUtils.zipWithUnique...

2015-09-01 Thread tillrohrmann
Github user tillrohrmann commented on the pull request: https://github.com/apache/flink/pull/1075#issuecomment-136655355 No problem @s1ck. It might be a bit redundant but it tests that the forwarding is done correctly. Therefore, I fixed the test case. --- If your project is set up

[GitHub] flink pull request: [FLINK-2590] fixing DataSetUtils.zipWithUnique...

2015-09-01 Thread s1ck
Github user s1ck commented on the pull request: https://github.com/apache/flink/pull/1075#issuecomment-136658385 Ok, thank you. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] flink pull request: [FLINK-2590] fixing DataSetUtils.zipWithUnique...

2015-09-01 Thread s1ck
Github user s1ck commented on the pull request: https://github.com/apache/flink/pull/1075#issuecomment-136633442 @tillrohrmann of course you are right, I thought wrong about it. it's committed --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] flink pull request: [FLINK-2590] fixing DataSetUtils.zipWithUnique...

2015-09-01 Thread s1ck
Github user s1ck commented on the pull request: https://github.com/apache/flink/pull/1075#issuecomment-136654131 Sorry, I did not see that there are also identical test cases in Scala which now fail due to the `-1` change. As those scala methods wrap the Java methods, is it necessary

[GitHub] flink pull request: [FLINK-2590] fixing DataSetUtils.zipWithUnique...

2015-09-01 Thread tillrohrmann
Github user tillrohrmann commented on the pull request: https://github.com/apache/flink/pull/1075#issuecomment-136640020 @s1ck, looks really good. Thanks for your contribution. Will merge it now. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] flink pull request: [FLINK-2590] fixing DataSetUtils.zipWithUnique...

2015-08-31 Thread s1ck
Github user s1ck commented on the pull request: https://github.com/apache/flink/pull/1075#issuecomment-136426216 @tillrohrmann While writing the new tests for both methods, I encountered that `zipWithIndex` is broken, too. It sometimes throws `ConcurrentModificationException`. This

[GitHub] flink pull request: [FLINK-2590] fixing DataSetUtils.zipWithUnique...

2015-08-31 Thread StephanEwen
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/1075#issuecomment-136427373 There is an issue that tracks the `ConcurrentModificationException`problem. As per discussion in that issue, can you use a `BroadcastVariableInitializer`? Safes

[GitHub] flink pull request: [FLINK-2590] fixing DataSetUtils.zipWithUnique...

2015-08-31 Thread s1ck
Github user s1ck commented on the pull request: https://github.com/apache/flink/pull/1075#issuecomment-136431165 @StephanEwen thx for the hint. works fine! Will cleanup and commit now. --- If your project is set up for it, you can reply to this email and have your reply appear on

[GitHub] flink pull request: [FLINK-2590] fixing DataSetUtils.zipWithUnique...

2015-08-31 Thread s1ck
Github user s1ck commented on the pull request: https://github.com/apache/flink/pull/1075#issuecomment-136445029 @tillrohrmann I did not include the `shifter = getBitSize(getRuntimeContext().getNumberOfParallelSubtasks() - 1)` as your hint only applies for power of 2 values. E.g.,

[GitHub] flink pull request: [FLINK-2590] fixing DataSetUtils.zipWithUnique...

2015-08-31 Thread HuangWHWHW
Github user HuangWHWHW commented on the pull request: https://github.com/apache/flink/pull/1075#issuecomment-136545997 Ah, thank you for the proof. And didn`t see the log2 in detail before, sorry. --- If your project is set up for it, you can reply to this email and have your

[GitHub] flink pull request: [FLINK-2590] fixing DataSetUtils.zipWithUnique...

2015-08-31 Thread StephanEwen
Github user StephanEwen commented on a diff in the pull request: https://github.com/apache/flink/pull/1075#discussion_r38305177 --- Diff: flink-java/src/main/java/org/apache/flink/api/java/utils/DataSetUtils.java --- @@ -121,6 +122,7 @@ public void mapPartition(Iterable values,

[GitHub] flink pull request: [FLINK-2590] fixing DataSetUtils.zipWithUnique...

2015-08-31 Thread StephanEwen
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/1075#issuecomment-136352327 +1 for a test, otherwise this looks good! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] flink pull request: [FLINK-2590] fixing DataSetUtils.zipWithUnique...

2015-08-31 Thread StephanEwen
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/1075#issuecomment-136382814 @s1ck Good idea. You can also call `collect()`, add the IDs to a set and make sure the set has the right cardinality. In general, avoiding temp files and Strings

[GitHub] flink pull request: [FLINK-2590] fixing DataSetUtils.zipWithUnique...

2015-08-31 Thread s1ck
Github user s1ck commented on the pull request: https://github.com/apache/flink/pull/1075#issuecomment-136380943 There is already a test case for zipWithUniqueId() in

[GitHub] flink pull request: [FLINK-2590] fixing DataSetUtils.zipWithUnique...

2015-08-31 Thread tillrohrmann
Github user tillrohrmann commented on the pull request: https://github.com/apache/flink/pull/1075#issuecomment-136386406 @s1ck, the `testZipWithUniqueId` test is bogus. You can remove this test case an replace it with your described test. It would also be great if you could set the

[GitHub] flink pull request: [FLINK-2590] fixing DataSetUtils.zipWithUnique...

2015-08-30 Thread rmetzger
Github user rmetzger commented on the pull request: https://github.com/apache/flink/pull/1075#issuecomment-136129581 Thanks a lot for the contribution. Can you add a test case for the method to make sure the issue is not re-introduced again when somebody else is changing the code?

[GitHub] flink pull request: [FLINK-2590] fixing DataSetUtils.zipWithUnique...

2015-08-30 Thread HuangWHWHW
Github user HuangWHWHW commented on the pull request: https://github.com/apache/flink/pull/1075#issuecomment-136245681 @rmetzger +1. I think add a test is helpful. Otherwise can you give us a infomation that prove the 'id = (counter shifter) + taskId; ' will never generate the

[GitHub] flink pull request: [FLINK-2590] fixing DataSetUtils.zipWithUnique...

2015-08-29 Thread s1ck
GitHub user s1ck opened a pull request: https://github.com/apache/flink/pull/1075 [FLINK-2590] fixing DataSetUtils.zipWithUniqueId() * modified algorithm as explained in the issue * updated method documentation You can merge this pull request into a Git repository by running: