[GitHub] flink issue #2527: [FLINK-4624] Allow for null values in Graph Summarization

2016-09-28 Thread s1ck
Github user s1ck commented on the issue:

https://github.com/apache/flink/pull/2527
  
@greghogan since I addressed all your comments, is there anything left to 
do that prevents this from merging?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #2527: [FLINK-4624] Allow for null values in Graph Summarization

2016-09-23 Thread s1ck
Github user s1ck commented on the issue:

https://github.com/apache/flink/pull/2527
  
> I'd also like to add a ToNullValue translator that would accept any type 
and convert to NullValue.

I don't know if I get this right. Do you mean an additional test case with 
`NullValue` as vertex/edge?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #2527: [FLINK-4624] Allow for null values in Graph Summarization

2016-09-23 Thread s1ck
Github user s1ck commented on the issue:

https://github.com/apache/flink/pull/2527
  
> Would a graph translator simplify the conversion from Long to String? You 
can do graph.run(new TranslateEdgeValues<...>(new StringToLong()) and write a 
simple public class StringToLong implements TranslateFunction<String, Long> { 
 Or this could be an anonymous class.

I was not aware of this function / interface. I just needed it to use the 
same data for String and Long tests. I will update the conversion to your 
suggested method.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #2527: [FLINK-4624] Allow for null values in Graph Summarization

2016-09-23 Thread s1ck
Github user s1ck commented on the issue:

https://github.com/apache/flink/pull/2527
  
> I'm not following why specifying the TypeInformation is now required with 
the change to using Either. Is the type system failing to handle this properly?
You are right, I was experimenting with return type definitions for the 
groupReduce and the map call. However, for the map call in 
`Summarization.java:126` the return type definition is required or the job 
fails as it cannot determine the type for `VV`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #2527: [FLINK-4624] Allow for null values in Graph Summar...

2016-09-23 Thread s1ck
Github user s1ck commented on a diff in the pull request:

https://github.com/apache/flink/pull/2527#discussion_r80186110
  
--- Diff: 
flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/Summarization.java
 ---
@@ -226,11 +247,15 @@ public void setGroupRepresentativeId(K 
groupRepresentativeId) {
}
 
public VGV getVertexGroupValue() {
-   return f2;
+   return f2.isLeft() ? f2.left() : null;
}
 
public void setVertexGroupValue(VGV vertexGroupValue) {
-   f2 = vertexGroupValue;
+   if (vertexGroupValue == null) {
+   f2 = new 
Either.Right<>(NullValue.getInstance());
--- End diff --

An instance of `VertexGroupItem` is reused in the `VertexGroupReducer`. 
Here the setter is implicitly only called once in the open method. So I thought 
reusing the new Right<>(NullValue.getInstance()) wouldn't be a benefit.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #2527: [FLINK-4624] Allow for null values in Graph Summar...

2016-09-21 Thread s1ck
GitHub user s1ck opened a pull request:

https://github.com/apache/flink/pull/2527

[FLINK-4624] Allow for null values in Graph Summarization

* Bug was caused by serializers that cannot handle null values (e.g. Long)
* VertexGroupItem now uses Either<NullValue, VV> instead of VV
* Generalized test cases
* Added tests for vertex/edge values of type Long

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/s1ck/flink FLINK-4624

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/2527.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2527


commit e6db894d6b84cf95206905a6f1a6713f78e32988
Author: Martin Junghanns <martin.jungha...@gmx.net>
Date:   2016-09-21T11:31:41Z

[FLINK-4624] Support null values in Graph Summarization

* Bug was caused by serializers that cannot handle null values (e.g. Long)
* VertexGroupItem now uses Either<NullValue, VV> instead of VV
* Generalized test cases
* Added tests for vertex/edge values of type Long




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-3611] [docs] corrected link in CONTRIBU...

2016-03-14 Thread s1ck
Github user s1ck commented on the pull request:

https://github.com/apache/flink/pull/1786#issuecomment-196175512
  
Sounds good, but then I would change the text to:
`A detailed explanation can be found in our [Contribute Code 
Guide](http://flink.apache.org/contribute-code.html) which also contains a list 
of coding guidelines that you should follow.`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-3611] [docs] corrected link in CONTRIBU...

2016-03-13 Thread s1ck
GitHub user s1ck opened a pull request:

https://github.com/apache/flink/pull/1786

[FLINK-3611] [docs] corrected link in CONTRIBUTING.md



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/s1ck/flink FLINK-3611

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/1786.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1786


commit f79cbedcfd82de995cf0a8df6fe9fb3601e4f6d8
Author: Martin Junghanns <m.jungha...@mailbox.org>
Date:   2016-03-13T23:08:10Z

[FLINK-3611] [docs] corrected link in CONTRIBUTING.md




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-3272] [Gelly] Generic value in Connecte...

2016-03-13 Thread s1ck
GitHub user s1ck opened a pull request:

https://github.com/apache/flink/pull/1785

[FLINK-3272] [Gelly] Generic value in Connected Components

* Updated scatter-gather and GSA implementation + tests
* updated documentation

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/s1ck/flink FLINK-3272

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/1785.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1785


commit 9ad04a672b1b277d1853e94f7dd792ef8448c19d
Author: Martin Junghanns <m.jungha...@mailbox.org>
Date:   2016-03-13T17:01:52Z

[FLINK-3272] [Gelly] Generic value in Connected Components

* Updated scather-gather and GSA implementation * tests
* updated documentation




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-3207] [gelly] adds the vertex-centric i...

2016-03-09 Thread s1ck
Github user s1ck commented on the pull request:

https://github.com/apache/flink/pull/1575#issuecomment-194251517
  
Ok, I will keep that in mind. Thank you!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-3207] [gelly] adds the vertex-centric i...

2016-03-08 Thread s1ck
Github user s1ck commented on the pull request:

https://github.com/apache/flink/pull/1575#issuecomment-193968765
  
Hi @vasia this is a nice PR, I commented some minor documentation / naming 
issues
+1 for merging


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-3122] [Gelly] Generic vertex value in l...

2016-01-19 Thread s1ck
GitHub user s1ck opened a pull request:

https://github.com/apache/flink/pull/1521

[FLINK-3122] [Gelly] Generic vertex value in label propagation

* updated algorithm and tests
* updated / corrected algorithm documentation

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/s1ck/flink FLINK-3122

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/1521.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1521


commit 43004e83a4572837497ea5952a58469af9e277b5
Author: Martin Junghanns <m.jungha...@mailbox.org>
Date:   2016-01-19T18:11:19Z

[FLINK-3122] [Gelly] Generic vertex value in label propagation

* updated algorithm and tests
* updated / corrected algorithm documentation




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-3118] [Gelly] Consider ResultTypeQuerya...

2016-01-06 Thread s1ck
Github user s1ck commented on the pull request:

https://github.com/apache/flink/pull/1471#issuecomment-169421134
  
Hi, sorry for the delay. I'll add the test asap.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-3118] [Gelly] Consider ResultTypeQuerya...

2015-12-21 Thread s1ck
Github user s1ck commented on the pull request:

https://github.com/apache/flink/pull/1471#issuecomment-166341381
  
Sorry for missing javadoc, I was inspired by the other methods in 
`TypeExtractor` ;) The test will be a generic version of `LabelPropagation`, 
described in https://issues.apache.org/jira/browse/FLINK-3122 Would that be 
enough for a test?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-3118] [Gelly] Consider ResultTypeQuerya...

2015-12-18 Thread s1ck
GitHub user s1ck opened a pull request:

https://github.com/apache/flink/pull/1471

[FLINK-3118] [Gelly] Consider ResultTypeQueryable in Message and 
GatherFunction



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/s1ck/flink FLINK-3118

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/1471.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1471


commit 67fb836cf68d7385c38d708b05d7d8826b53a34e
Author: Martin Junghanns <m.jungha...@mailbox.org>
Date:   2015-12-18T14:01:24Z

[FLINK-3118] [Gelly] Consider ResultTypeQueryable in MessageFunction and 
GatherFunction




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-3064] [core] Add size check in GroupRed...

2015-11-23 Thread s1ck
GitHub user s1ck opened a pull request:

https://github.com/apache/flink/pull/1396

[FLINK-3064] [core] Add size check in GroupReduceOperatorBase



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/s1ck/flink FLINK-3064

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/1396.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1396


commit 8efb4758e32545a44dad9e52ac5b7b08bb75d8a9
Author: Martin Junghanns <m.jungha...@mailbox.org>
Date:   2015-11-23T19:07:43Z

[FLINK-3064] [core] Add size check in GroupReduceOperatorBase




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2981] [docs] update requirements in doc...

2015-11-06 Thread s1ck
GitHub user s1ck opened a pull request:

https://github.com/apache/flink/pull/1335

[FLINK-2981] [docs] update requirements in docs/README.md

* added libraries and versions

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/s1ck/flink FLINK-2981

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/1335.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1335


commit 48873ab14b0abf20871d2a3aa4e479edb0ffb710
Author: Martin Junghanns <m.jungha...@mailbox.org>
Date:   2015-11-06T10:33:43Z

[FLINK-2981] [docs] update requirements in docs/README

* added libraries and versions




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2905] [gelly] Add Graph Intersection me...

2015-11-05 Thread s1ck
Github user s1ck commented on the pull request:

https://github.com/apache/flink/pull/1329#issuecomment-154057136
  
Hey!
you're welcome :) I will fix the comments later today. There is no 
difference, `graph1.intersect(graph2)` is just a convenience method if the user 
wants the default behaviour, which is `graph1.intersect(graph2, true)`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2905] [gelly] Add Graph Intersection me...

2015-11-05 Thread s1ck
Github user s1ck commented on the pull request:

https://github.com/apache/flink/pull/1329#issuecomment-154061114
  
yeah, that's fine with me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2905] [gelly] Add Graph Intersection me...

2015-11-04 Thread s1ck
GitHub user s1ck opened a pull request:

https://github.com/apache/flink/pull/1329

[FLINK-2905] [gelly] Add Graph Intersection method

* added two strategies for edge intersection
* added Scala methods
* added tests
* updated docs

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/s1ck/flink FLINK-2905

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/1329.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1329


commit ac7dde016e4284a6deb3bfe4529e27968899f825
Author: Martin Junghanns <m.jungha...@mailbox.org>
Date:   2015-10-29T13:42:37Z

[FLINK-2905] [gelly] Add Graph Intersection method

* added two strategies for edge intersection
* added Scala methods
* added tests
* updated docs




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2411] [gelly] Add Summarization Algorit...

2015-10-20 Thread s1ck
GitHub user s1ck opened a pull request:

https://github.com/apache/flink/pull/1269

[FLINK-2411] [gelly] Add Summarization Algorithm

* implemented algorithm
* implemented integration tests
* updated gelly guide

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/s1ck/flink FLINK-2411

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/1269.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1269


commit 2e1f4a5e6df91fe0b8754165dffd76169c40e496
Author: Martin Junghanns <m.jungha...@mailbox.org>
Date:   2015-10-20T07:28:40Z

[FLINK-2411] [gelly] Add Summarization Algorithm

* implemented algorithm
* implemented integration tests
* updated gelly guide




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2411] Add graph summarization algorithm

2015-10-20 Thread s1ck
Github user s1ck closed the pull request at:

https://github.com/apache/flink/pull/1264


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2411] Add graph summarization algorithm

2015-10-19 Thread s1ck
Github user s1ck commented on the pull request:

https://github.com/apache/flink/pull/1264#issuecomment-149149391
  
Hi @vasia,
I will update the docs asap and append it to the PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2411] Add graph summarization algorithm

2015-10-17 Thread s1ck
GitHub user s1ck opened a pull request:

https://github.com/apache/flink/pull/1264

[FLINK-2411] Add graph summarization algorithm

* added algorithm
* added integration tests

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/s1ck/flink FLINK-2411

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/1264.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1264


commit 3ac42a6c23fc900b6451174b5b463cdba60951da
Author: Martin Junghanns <m.jungha...@mailbox.org>
Date:   2015-10-17T14:59:32Z

[FLINK-2411] Add graph summarization algorithm

* added algorithm
* added integration tests




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2590] fixing DataSetUtils.zipWithUnique...

2015-09-01 Thread s1ck
Github user s1ck commented on the pull request:

https://github.com/apache/flink/pull/1075#issuecomment-136658385
  
Ok, thank you.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2590] fixing DataSetUtils.zipWithUnique...

2015-09-01 Thread s1ck
Github user s1ck commented on the pull request:

https://github.com/apache/flink/pull/1075#issuecomment-136633442
  
@tillrohrmann of course you are right, I thought wrong about it. it's 
committed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2590] fixing DataSetUtils.zipWithUnique...

2015-09-01 Thread s1ck
Github user s1ck commented on the pull request:

https://github.com/apache/flink/pull/1075#issuecomment-136654131
  
Sorry, I did not see that there are also identical test cases in Scala 
which now fail due to the `-1` change. As those scala methods wrap the Java 
methods, is it necessary to run the same tests on them again?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2590] fixing DataSetUtils.zipWithUnique...

2015-08-31 Thread s1ck
Github user s1ck commented on the pull request:

https://github.com/apache/flink/pull/1075#issuecomment-136426216
  
@tillrohrmann While writing the new tests for both methods, I encountered 
that `zipWithIndex` is broken, too. It sometimes throws 
`ConcurrentModificationException`. This is because each task sorts a 
broadcasted list in the `open` method. This could not fail before due to 
parallelism = 1.
I would fix this by creating a local copy of that list (which should be 
small in that specific case). Shall I fix this in the same issue or do you want 
me to create a new issue for that?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2590] fixing DataSetUtils.zipWithUnique...

2015-08-31 Thread s1ck
Github user s1ck commented on the pull request:

https://github.com/apache/flink/pull/1075#issuecomment-136431165
  
@StephanEwen thx for the hint. works fine! Will cleanup and commit now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2590] fixing DataSetUtils.zipWithUnique...

2015-08-31 Thread s1ck
Github user s1ck commented on the pull request:

https://github.com/apache/flink/pull/1075#issuecomment-136445029
  
@tillrohrmann I did not include the `shifter = 
getBitSize(getRuntimeContext().getNumberOfParallelSubtasks() - 1)` as your hint 
only applies for power of 2 values. E.g., `getBitSize(7)` returns 3 and we need 
3 bits to cover the range from 0 to 6.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2590] fixing DataSetUtils.zipWithUnique...

2015-08-31 Thread s1ck
Github user s1ck commented on the pull request:

https://github.com/apache/flink/pull/1075#issuecomment-136380943
  
There is already a test case for zipWithUniqueId() in 
https://github.com/apache/flink/blob/master/flink-tests/src/test/java/org/apache/flink/test/util/DataSetUtilsITCase.java#L66
However, this test is under the assumption that there is only one task 
running, which is why it did not fail in the first place.
If there are multiple tasks, the resulting unique id is not deterministic 
for a single dataset element. I would implement a test, that creates a dataset, 
applies the `zipWithUniqueId` method, calls `distinct(0)` on the created ids 
and checks the number of resulting elements (must be equal to the input 
dataset). Would this be sufficient?
Furthermore, the current test cases for `DataSetUtils` assume a resulting 
dataset as string and check this after each test run. My proposed test would 
not fit in that scheme. Should I create a new test case class for this method?

@StephanEwen I wanted to do this, but static doesn't work with anonymous 
classes. However, I can declare the UDF as a private inner class (didn't want 
to change much code).
@HuangWHWHW the `log2` method already existed and in the issue, I proposed 
to rename it. Maybe `getBitSize(long value)`? As for the "proof": if each task 
id is smaller than the total number of parallel tasks t, its bit representation 
is also smaller than the bit representation of t. Thus, when we shift the 
counter by the number of bits of t, there cannot be a collision for different 
task ids


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2590] fixing DataSetUtils.zipWithUnique...

2015-08-29 Thread s1ck
GitHub user s1ck opened a pull request:

https://github.com/apache/flink/pull/1075

[FLINK-2590] fixing DataSetUtils.zipWithUniqueId()

* modified algorithm as explained in the issue
* updated method documentation

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/s1ck/flink FLINK-2590

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/1075.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1075


commit ab362b5b5ae390449972cc03f398d75c0231cb3c
Author: Martin Junghanns martin.jungha...@gmx.net
Date:   2015-08-29T20:51:19Z

[FLINK-2590] fixing DataSetUtils.zipWithUniqueId()

* modified algorithm as explained in the issue
* updated method documentation




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---