[GitHub] flink pull request: [FLINK-1963] Improve distinct() transformation

2015-07-16 Thread fhueske
Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/905#issuecomment-122117624
  
Hi @pp86, are you still working on this PR? 
I would suggest to open a new PR for the documentation (FLINK-2362).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1963] Improve distinct() transformation

2015-07-16 Thread fhueske
Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/905#issuecomment-122121617
  
No worries :-)
You better use a dedicated branch for your next PR instead of the master 
branch.

I'll go ahead and merge this PR now. Thanks for the contribution!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1963] Improve distinct() transformation

2015-07-16 Thread fhueske
Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/905#issuecomment-122132484
  
Merged, thanks for the contribution @pp86 !


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1963] Improve distinct() transformation

2015-07-16 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/905


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1963] Improve distinct() transformation

2015-07-16 Thread hsaputra
Github user hsaputra commented on a diff in the pull request:

https://github.com/apache/flink/pull/905#discussion_r34847588
  
--- Diff: 
flink-tests/src/test/java/org/apache/flink/test/javaApiOperators/DistinctITCase.java
 ---
@@ -274,4 +277,45 @@ public Integer map(POJO value) throws Exception {
return (int) value.nestedPojo.longNumber;
}
}
+
+   @Test
+   public void testCorrectnessOfDistinctOnAtomic() throws Exception {
+   /*
+   * check correctness of distinct on Integers
+   */
+
+   final ExecutionEnvironment env = 
ExecutionEnvironment.getExecutionEnvironment();
+   DataSetInteger ds = CollectionDataSets.getIntegerDataSet(env);
+   DataSetInteger reduceDs = ds.distinct();
+
+   ListInteger result = reduceDs.collect();
+
+   String expected = 1\n2\n3\n4\n5;
+
+   compareResultAsText(result, expected);
+   }
+
+   @Test
+   public void testCorrectnessOfDistinctOnAtomicWithSelectAllChar() throws 
Exception {
+   /*
--- End diff --

Looks like misaligned for the comment block.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1963] Improve distinct() transformation

2015-07-16 Thread fhueske
Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/905#issuecomment-122131578
  
Good catch @hsaputra :-)
I'll fix it before merging


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1963] Improve distinct() transformation

2015-07-16 Thread pp86
Github user pp86 commented on the pull request:

https://github.com/apache/flink/pull/905#issuecomment-122118573
  
Hi @fhueske ,

I'm very sorry, this is the first time I work with github, I am doing some 
mess :| 
But, I think I finally succeeded in removing the unwanted commits for 
documentation.

I am not working on the code any more. 
Thanks


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1963] Improve distinct() transformation

2015-07-15 Thread pp86
Github user pp86 commented on the pull request:

https://github.com/apache/flink/pull/905#issuecomment-121739943
  
Dear @fhueske ,
I hope I satisfied your requests! Let me know if some other changes are 
needed.

As you suggested I took the issue on documentation.

Thanks,
Pietro


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1963] Improve distinct() transformation

2015-07-15 Thread fhueske
Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/905#issuecomment-121775689
  
Looks good to me, thanks!
Will try the changes tomorrow and merge.

Thanks for helping with the documentation!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1963] Improve distinct() transformation

2015-07-14 Thread pp86
Github user pp86 commented on the pull request:

https://github.com/apache/flink/pull/905#issuecomment-121192327
  
hello @chiwanpark!  what do you think about the tests? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1963] Improve distinct() transformation

2015-07-14 Thread pp86
Github user pp86 commented on the pull request:

https://github.com/apache/flink/pull/905#issuecomment-121187436
  
sure thing!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1963] Improve distinct() transformation

2015-07-14 Thread fhueske
Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/905#issuecomment-121286738
  
Hi, thanks for the pull request.

I noticed that the `CollectionDataSets.getStringDataSet` data set does not 
contain any duplicates. Therefore, it is not well suited to test the distinct 
operator.
There also a few more tests for the Distinct operator that need to be 
adapted:
- DistinctITCase.java
- DistinctOperatorTest.java
- DistinctOperatorTest.scala

The Unit tests (*Test) check if correct arguments are accepted and invalid 
arguments are rejected. It would be nice, if you could try to do a distinct() 
on a DataSetYourObject, where YourObject is not a Pojo but a generic type 
which does not implement `Comparable`, i.e., it is not a Key type.

Thank you, Fabian


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1963] Improve distinct() transformation

2015-07-14 Thread chiwanpark
Github user chiwanpark commented on the pull request:

https://github.com/apache/flink/pull/905#issuecomment-121283389
  
@pp86 Looks good to merge. :)
If another committer gives LGTM to this PR, I'll merge this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1963] Improve distinct() transformation

2015-07-14 Thread pp86
Github user pp86 commented on the pull request:

https://github.com/apache/flink/pull/905#issuecomment-121406113
  
Hi guys,
I added some tests.

I also added a test on a generic data type which does not implement 
Comparable. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1963] Improve distinct() transformation

2015-07-14 Thread fhueske
Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/905#discussion_r34626753
  
--- Diff: flink-java/src/main/java/org/apache/flink/api/java/DataSet.java 
---
@@ -618,9 +620,9 @@ public long count() throws Exception {
}

/**
-* Returns a distinct set of a {@link Tuple} {@link DataSet} using all 
fields of the tuple.
+* Returns a distinct set of a {@link DataSet} using all fields of the 
tuple.
 * p
-* Note: This operator can only be applied to Tuple DataSets.
+* If the input is a {@link Tuple} {@link DataSet}, uses all fields of 
the tuple.
--- End diff --

Can you change this to If the input is a composite type (Tuple or Pojo 
type), distinct is performed on all fields and each field must be a key type.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1963] Improve distinct() transformation

2015-07-14 Thread fhueske
Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/905#discussion_r34626780
  
--- Diff: flink-java/src/main/java/org/apache/flink/api/java/DataSet.java 
---
@@ -618,9 +620,9 @@ public long count() throws Exception {
}

/**
-* Returns a distinct set of a {@link Tuple} {@link DataSet} using all 
fields of the tuple.
+* Returns a distinct set of a {@link DataSet} using all fields of the 
tuple.
--- End diff --

Can you remove the using all fields of the tuple part?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1963] Improve distinct() transformation

2015-07-14 Thread fhueske
Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/905#discussion_r34627102
  
--- Diff: 
flink-tests/src/test/java/org/apache/flink/test/javaApiOperators/DistinctITCase.java
 ---
@@ -274,4 +277,81 @@ public Integer map(POJO value) throws Exception {
return (int) value.nestedPojo.longNumber;
}
}
+
+   @Test
+   public void testCorrectnessOfDistinctOnAtomic() throws Exception {
+   /*
+   * check correctness of distinct on Integers
+   */
+
+   final ExecutionEnvironment env = 
ExecutionEnvironment.getExecutionEnvironment();
+   DataSetInteger ds = CollectionDataSets.getIntegerDataSet(env);
+   DataSetInteger reduceDs = ds.distinct();
+
+   ListInteger result = reduceDs.collect();
+
+   String expected = 1\n2\n3\n4\n5;
+
+   compareResultAsText(result, expected);
+   }
+
+   @Test
+   public void testCorrectnessOfDistinctOnAtomicWithSelectAllChar() throws 
Exception {
+   /*
+   * check correctness of distinct on Strings, using 
Keys.ExpressionKeys.SELECT_ALL_CHAR
+   */
+
+   final ExecutionEnvironment env = 
ExecutionEnvironment.getExecutionEnvironment();
+   DataSetString ds = CollectionDataSets.getStringDataSet(env);
+   DataSetString reduceDs = ds.union(ds).distinct(*);
+
+   ListString result = reduceDs.collect();
+
+   String expected = I am fine.\n +
+   Luke Skywalker\n +
+   LOL\n +
+   Hello world, how are you?\n +
+   Hi\n +
+   Hello world\n +
+   Hello\n +
+   Random comment\n;
+
+   compareResultAsText(result, expected);
+   }
+
+   @Test(expected = InvalidProgramException.class)
+   public void testDistinctOnNotKeyDataType() throws Exception {
--- End diff --

We usually add such failing tests to the unit tests 
(DistinctOperatorTest.java) instead of integration tests.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1963] Improve distinct() transformation

2015-07-13 Thread pp86
GitHub user pp86 opened a pull request:

https://github.com/apache/flink/pull/905

[FLINK-1963] Improve distinct() transformation

- add the possibility to use distinct() on Atomic data types
- add the possibility to use distinct(_)/distinct(*) on Atomic data 
types



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/pp86/flink master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/905.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #905


commit f9a69111c1050d45d14ffb332411af6663482dd5
Author: pietro pinoli pie...@pietros-mbp.lan
Date:   2015-07-13T11:32:20Z

[FLINK-1963] Improve distinct() transformation
- add the possibility to use distinct() on Atomic data types
- add the possibility to use distinct(_)/distinct(*) on Atomic data 
types

commit da699987852d7481843d8b8b60c976d3d8e61a69
Author: pietro pinoli pie...@pietros-mbp.lan
Date:   2015-07-13T11:58:28Z

[FLINK-1963] Improve distinct() transformation …
- add the possibility to use distinct() on Atomic data types
- add the possibility to use distinct(_)/distinct(*) on Atomic data 
types

commit 911db2d2efcca227d1fdcf595cb6c5a2cd4e92ce
Author: pietro pinoli pie...@pietros-mbp.lan
Date:   2015-07-13T12:01:20Z

Clean code




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1963] Improve distinct() transformation

2015-07-13 Thread chiwanpark
Github user chiwanpark commented on the pull request:

https://github.com/apache/flink/pull/905#issuecomment-120913343
  
Hi, @pp86 Thanks for your contribution.

But I think that using `AutoSelector` is not the best approach to improve 
distinct transformation. In Flink, a `KeySelector` converts a `DataSetO` to 
`DataSetTuple2K, O` and uses the first element of the tuple as key. For 
atomic types, `AutoSelector` creates `DataSetTuple2V, V` which 
unnecessarily duplicated data.

I recommend `Keys.ExpressionKeys` when the user call `distinct()` method on 
atomic data types.

And It would be better to add the test cases for this changes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1963] Improve distinct() transformation

2015-07-13 Thread chiwanpark
Github user chiwanpark commented on the pull request:

https://github.com/apache/flink/pull/905#issuecomment-121119196
  
@pp86 It seems okay but we need to check this change with some test cases. 
Could you add some test cases into `DistinctITCase` in `flink-tests` module?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1963] Improve distinct() transformation

2015-07-13 Thread chiwanpark
Github user chiwanpark commented on the pull request:

https://github.com/apache/flink/pull/905#issuecomment-120922707
  
@pp86 Hi you can modify this pull request by adding commit in your branch. 
(pp86:master)
I think reopening this pull request and adding commit is better than 
opening new pull request. :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1963] Improve distinct() transformation

2015-07-13 Thread pp86
Github user pp86 commented on the pull request:

https://github.com/apache/flink/pull/905#issuecomment-120922481
  
Dear Chiwan,
thanks for you suggestions.
I'll integrate them and open a new pull request. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1963] Improve distinct() transformation

2015-07-13 Thread pp86
Github user pp86 closed the pull request at:

https://github.com/apache/flink/pull/905


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---