[jira] [Resolved] (FLINK-2560) Flink-Avro Plugin cannot be handled by Eclipse

2015-08-23 Thread Chiwan Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chiwan Park resolved FLINK-2560.

   Resolution: Fixed
Fix Version/s: 0.10

Fixed at 9c7f769388d90c3a79d8c08995d4eae892b23a6e

 Flink-Avro Plugin cannot be handled by Eclipse
 --

 Key: FLINK-2560
 URL: https://issues.apache.org/jira/browse/FLINK-2560
 Project: Flink
  Issue Type: Improvement
Reporter: Matthias J. Sax
Assignee: Matthias J. Sax
Priority: Trivial
 Fix For: 0.10


 Eclipse always shows the following error:
 {noformat}
 Description   ResourcePathLocationType Plugin execution 
 not overed by lifecycle configuration: 
 org.apache.avro:avro-maven-plugin:1.7.7:schema (execution: default, phase: 
 generate-sources)   pom.xml /flink-avro line 134Maven Project 
 Build Lifecycle Mapping problem
 {noformat}
 This can be fixed by disable plugin within Eclipse via pluginManagement ... 
 lifecyleMappingMetaData



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (FLINK-2557) Manual type information via returns fails in DataSet API

2015-08-23 Thread Chesnay Schepler (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler reassigned FLINK-2557:
---

Assignee: Chesnay Schepler

 Manual type information via returns fails in DataSet API
 --

 Key: FLINK-2557
 URL: https://issues.apache.org/jira/browse/FLINK-2557
 Project: Flink
  Issue Type: Bug
  Components: Java API
Reporter: Matthias J. Sax
Assignee: Chesnay Schepler

 I changed the WordCount example as below and get an exception:
 Tokenizer is change to this (removed generics and added cast to String):
 {code:java}
 public static final class Tokenizer implements FlatMapFunction {
   public void flatMap(Object value, Collector out) {
   String[] tokens = ((String) value).toLowerCase().split(\\W+);
   for (String token : tokens) {
   if (token.length()  0) {
   out.collect(new Tuple2String, Integer(token, 
 1));
   }
   }
   }
 }
 {code}
 I added call to returns() here:
 {code:java}
 DataSetTuple2String, Integer counts =
   text.flatMap(new Tokenizer()).returns(Tuple2String,Integer)
   .groupBy(0).sum(1);
 {code}
 The exception is:
 {noformat}
 Exception in thread main java.lang.IllegalArgumentException: The types of 
 the interface org.apache.flink.api.common.functions.FlatMapFunction could not 
 be inferred. Support for synthetic interfaces, lambdas, and generic types is 
 limited at this point.
   at 
 org.apache.flink.api.java.typeutils.TypeExtractor.getParameterType(TypeExtractor.java:686)
   at 
 org.apache.flink.api.java.typeutils.TypeExtractor.getParameterTypeFromGenericType(TypeExtractor.java:710)
   at 
 org.apache.flink.api.java.typeutils.TypeExtractor.getParameterType(TypeExtractor.java:673)
   at 
 org.apache.flink.api.java.typeutils.TypeExtractor.privateCreateTypeInfo(TypeExtractor.java:365)
   at 
 org.apache.flink.api.java.typeutils.TypeExtractor.getUnaryOperatorReturnType(TypeExtractor.java:279)
   at 
 org.apache.flink.api.java.typeutils.TypeExtractor.getFlatMapReturnTypes(TypeExtractor.java:120)
   at org.apache.flink.api.java.DataSet.flatMap(DataSet.java:262)
   at 
 org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:69)
 {noformat}
 Fix:
 This should not immediately fail, but also only give a MissingTypeInfo so 
 that type hints would work.
 The error message is also wrong, btw: It should state that raw types are not 
 supported.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2563] [gelly] changed Graph's run() to ...

2015-08-23 Thread andralungu
Github user andralungu commented on the pull request:

https://github.com/apache/flink/pull/1042#issuecomment-133881425
  
Exactly what I had in mind. +1 to merge


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2017) Add predefined required parameters to ParameterTool

2015-08-23 Thread Martin Liesenberg (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708474#comment-14708474
 ] 

Martin Liesenberg commented on FLINK-2017:
--

The use of an Option object to encapsulate the parameters should probably be 
used in ParameterTool as well, right?

What I have come up with is a generic Option class and a corresponding 
RequiredParameter class.

 Add predefined required parameters to ParameterTool
 ---

 Key: FLINK-2017
 URL: https://issues.apache.org/jira/browse/FLINK-2017
 Project: Flink
  Issue Type: Improvement
Affects Versions: 0.9
Reporter: Robert Metzger
  Labels: starter

 In FLINK-1525 we've added the {{ParameterTool}}.
 During the PR review, there was a request for required parameters.
 This issue is about implementing a facility to define required parameters. 
 The tool should also be able to print a help menu with a list of all 
 parameters.
 This test case shows my initial ideas how to design the API
 {code}
   @Test
   public void requiredParameters() {
   RequiredParameters required = new RequiredParameters();
   Option input = required.add(input).alt(i).help(Path to 
 input file or directory); // parameter with long and short variant
   required.add(output); // parameter only with long variant
   Option parallelism = 
 required.add(parallelism).alt(p).type(Integer.class); // parameter with 
 type
   Option spOption = 
 required.add(sourceParallelism).alt(sp).defaultValue(12).help(Number 
 specifying the number of parallel data source instances); // parameter with 
 default value, specifying the type.
   Option executionType = 
 required.add(executionType).alt(et).defaultValue(pipelined).choices(pipelined,
  batch);
   ParameterUtil parameter = ParameterUtil.fromArgs(new 
 String[]{-i, someinput, --output, someout, -p, 15});
   required.check(parameter);
   required.printHelp();
   required.checkAndPopulate(parameter);
   String inputString = input.get();
   int par = parallelism.getInteger();
   String output = parameter.get(output);
   int sourcePar = parameter.getInteger(spOption.getName());
   }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2557) Manual type information via returns fails in DataSet API

2015-08-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708505#comment-14708505
 ] 

ASF GitHub Bot commented on FLINK-2557:
---

GitHub user zentol opened a pull request:

https://github.com/apache/flink/pull/1045

[FLINK-2557] TypeExtractor properly returns MissingTypeInfo

This fix is not really obvious so let me explain:

getParameterTye() is called from two different places in the TypeExtractor; 
to validate the input type and to extract the output type.

Both cases consider the possibility that getParameterType() fails, but 
check for different exceptions. 

The TypeExtractor only returns a MissingTypeInfo if it encounters an 
InvalidTypesException; IllegalArgumentExceptions are not catched. This is what 
@mjsax encountered.
Changing the exception type causes the TypeExtractor to properly return a 
MissingTypeInfo, which is later overridden by the returns(...) call.

In order for the input validation to still work properly aswell, it now 
catches InvalidTypesExceptions instead.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/zentol/flink 2557_types

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/1045.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1045


commit 1c1dc459915c875ab0a4412aa3ef0a844f092171
Author: zentol s.mo...@web.de
Date:   2015-08-23T19:41:44Z

[FLINK-2557] TypeExtractor properly returns MissingTypeInfo




 Manual type information via returns fails in DataSet API
 --

 Key: FLINK-2557
 URL: https://issues.apache.org/jira/browse/FLINK-2557
 Project: Flink
  Issue Type: Bug
  Components: Java API
Reporter: Matthias J. Sax
Assignee: Chesnay Schepler

 I changed the WordCount example as below and get an exception:
 Tokenizer is change to this (removed generics and added cast to String):
 {code:java}
 public static final class Tokenizer implements FlatMapFunction {
   public void flatMap(Object value, Collector out) {
   String[] tokens = ((String) value).toLowerCase().split(\\W+);
   for (String token : tokens) {
   if (token.length()  0) {
   out.collect(new Tuple2String, Integer(token, 
 1));
   }
   }
   }
 }
 {code}
 I added call to returns() here:
 {code:java}
 DataSetTuple2String, Integer counts =
   text.flatMap(new Tokenizer()).returns(Tuple2String,Integer)
   .groupBy(0).sum(1);
 {code}
 The exception is:
 {noformat}
 Exception in thread main java.lang.IllegalArgumentException: The types of 
 the interface org.apache.flink.api.common.functions.FlatMapFunction could not 
 be inferred. Support for synthetic interfaces, lambdas, and generic types is 
 limited at this point.
   at 
 org.apache.flink.api.java.typeutils.TypeExtractor.getParameterType(TypeExtractor.java:686)
   at 
 org.apache.flink.api.java.typeutils.TypeExtractor.getParameterTypeFromGenericType(TypeExtractor.java:710)
   at 
 org.apache.flink.api.java.typeutils.TypeExtractor.getParameterType(TypeExtractor.java:673)
   at 
 org.apache.flink.api.java.typeutils.TypeExtractor.privateCreateTypeInfo(TypeExtractor.java:365)
   at 
 org.apache.flink.api.java.typeutils.TypeExtractor.getUnaryOperatorReturnType(TypeExtractor.java:279)
   at 
 org.apache.flink.api.java.typeutils.TypeExtractor.getFlatMapReturnTypes(TypeExtractor.java:120)
   at org.apache.flink.api.java.DataSet.flatMap(DataSet.java:262)
   at 
 org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:69)
 {noformat}
 Fix:
 This should not immediately fail, but also only give a MissingTypeInfo so 
 that type hints would work.
 The error message is also wrong, btw: It should state that raw types are not 
 supported.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2565] Support primitive Arrays as keys

2015-08-23 Thread zentol
GitHub user zentol opened a pull request:

https://github.com/apache/flink/pull/1043

[FLINK-2565] Support primitive Arrays as keys

Adds a comparator and test for every primitive array type.
Modifies the CustomType2 class in GroupingTest to retain a field with an 
unsupported type.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/zentol/flink 2565_arrayKey

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/1043.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1043


commit 7551a47e60186a91ecc1df364f1a3ae0c9474a3f
Author: zentol s.mo...@web.de
Date:   2015-08-23T13:36:47Z

[FLINK-2565] Support primitive Arrays as keys




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2565) Support primitive arrays as keys

2015-08-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708409#comment-14708409
 ] 

ASF GitHub Bot commented on FLINK-2565:
---

GitHub user zentol opened a pull request:

https://github.com/apache/flink/pull/1043

[FLINK-2565] Support primitive Arrays as keys

Adds a comparator and test for every primitive array type.
Modifies the CustomType2 class in GroupingTest to retain a field with an 
unsupported type.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/zentol/flink 2565_arrayKey

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/1043.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1043


commit 7551a47e60186a91ecc1df364f1a3ae0c9474a3f
Author: zentol s.mo...@web.de
Date:   2015-08-23T13:36:47Z

[FLINK-2565] Support primitive Arrays as keys




 Support primitive arrays as keys
 

 Key: FLINK-2565
 URL: https://issues.apache.org/jira/browse/FLINK-2565
 Project: Flink
  Issue Type: Improvement
  Components: Java API
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2560] Flink-Avro Plugin cannot be handl...

2015-08-23 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/1041


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2560) Flink-Avro Plugin cannot be handled by Eclipse

2015-08-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708434#comment-14708434
 ] 

ASF GitHub Bot commented on FLINK-2560:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/1041


 Flink-Avro Plugin cannot be handled by Eclipse
 --

 Key: FLINK-2560
 URL: https://issues.apache.org/jira/browse/FLINK-2560
 Project: Flink
  Issue Type: Improvement
Reporter: Matthias J. Sax
Assignee: Matthias J. Sax
Priority: Trivial

 Eclipse always shows the following error:
 {noformat}
 Description   ResourcePathLocationType Plugin execution 
 not overed by lifecycle configuration: 
 org.apache.avro:avro-maven-plugin:1.7.7:schema (execution: default, phase: 
 generate-sources)   pom.xml /flink-avro line 134Maven Project 
 Build Lifecycle Mapping problem
 {noformat}
 This can be fixed by disable plugin within Eclipse via pluginManagement ... 
 lifecyleMappingMetaData



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2557) Manual type information via returns fails in DataSet API

2015-08-23 Thread Aljoscha Krettek (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708463#comment-14708463
 ] 

Aljoscha Krettek commented on FLINK-2557:
-

Yes, I think that's ok.

 Manual type information via returns fails in DataSet API
 --

 Key: FLINK-2557
 URL: https://issues.apache.org/jira/browse/FLINK-2557
 Project: Flink
  Issue Type: Bug
  Components: Java API
Reporter: Matthias J. Sax
Assignee: Chesnay Schepler

 I changed the WordCount example as below and get an exception:
 Tokenizer is change to this (removed generics and added cast to String):
 {code:java}
 public static final class Tokenizer implements FlatMapFunction {
   public void flatMap(Object value, Collector out) {
   String[] tokens = ((String) value).toLowerCase().split(\\W+);
   for (String token : tokens) {
   if (token.length()  0) {
   out.collect(new Tuple2String, Integer(token, 
 1));
   }
   }
   }
 }
 {code}
 I added call to returns() here:
 {code:java}
 DataSetTuple2String, Integer counts =
   text.flatMap(new Tokenizer()).returns(Tuple2String,Integer)
   .groupBy(0).sum(1);
 {code}
 The exception is:
 {noformat}
 Exception in thread main java.lang.IllegalArgumentException: The types of 
 the interface org.apache.flink.api.common.functions.FlatMapFunction could not 
 be inferred. Support for synthetic interfaces, lambdas, and generic types is 
 limited at this point.
   at 
 org.apache.flink.api.java.typeutils.TypeExtractor.getParameterType(TypeExtractor.java:686)
   at 
 org.apache.flink.api.java.typeutils.TypeExtractor.getParameterTypeFromGenericType(TypeExtractor.java:710)
   at 
 org.apache.flink.api.java.typeutils.TypeExtractor.getParameterType(TypeExtractor.java:673)
   at 
 org.apache.flink.api.java.typeutils.TypeExtractor.privateCreateTypeInfo(TypeExtractor.java:365)
   at 
 org.apache.flink.api.java.typeutils.TypeExtractor.getUnaryOperatorReturnType(TypeExtractor.java:279)
   at 
 org.apache.flink.api.java.typeutils.TypeExtractor.getFlatMapReturnTypes(TypeExtractor.java:120)
   at org.apache.flink.api.java.DataSet.flatMap(DataSet.java:262)
   at 
 org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:69)
 {noformat}
 Fix:
 This should not immediately fail, but also only give a MissingTypeInfo so 
 that type hints would work.
 The error message is also wrong, btw: It should state that raw types are not 
 supported.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2030][FLINK-2274][core][utils]Histogram...

2015-08-23 Thread chiwanpark
Github user chiwanpark commented on the pull request:

https://github.com/apache/flink/pull/861#issuecomment-133870941
  
+1 for moving histogram functions into `DataSetUtils`. It would be helpful 
for range partitioning. I'll review this in next days.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2030) Implement an online histogram with Merging and equalization features

2015-08-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708418#comment-14708418
 ] 

ASF GitHub Bot commented on FLINK-2030:
---

Github user chiwanpark commented on the pull request:

https://github.com/apache/flink/pull/861#issuecomment-133870941
  
+1 for moving histogram functions into `DataSetUtils`. It would be helpful 
for range partitioning. I'll review this in next days.


 Implement an online histogram with Merging and equalization features
 

 Key: FLINK-2030
 URL: https://issues.apache.org/jira/browse/FLINK-2030
 Project: Flink
  Issue Type: Sub-task
  Components: Machine Learning Library
Reporter: Sachin Goel
Assignee: Sachin Goel
Priority: Minor
  Labels: ML

 For the implementation of the decision tree in 
 https://issues.apache.org/jira/browse/FLINK-1727, we need to implement an 
 histogram with online updates, merging and equalization features. A reference 
 implementation is provided in [1]
 [1].http://www.jmlr.org/papers/volume11/ben-haim10a/ben-haim10a.pdf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2560) Flink-Avro Plugin cannot be handled by Eclipse

2015-08-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708429#comment-14708429
 ] 

ASF GitHub Bot commented on FLINK-2560:
---

Github user chiwanpark commented on the pull request:

https://github.com/apache/flink/pull/1041#issuecomment-133874756
  
Looks good to merge. I'll merge this.


 Flink-Avro Plugin cannot be handled by Eclipse
 --

 Key: FLINK-2560
 URL: https://issues.apache.org/jira/browse/FLINK-2560
 Project: Flink
  Issue Type: Improvement
Reporter: Matthias J. Sax
Assignee: Matthias J. Sax
Priority: Trivial

 Eclipse always shows the following error:
 {noformat}
 Description   ResourcePathLocationType Plugin execution 
 not overed by lifecycle configuration: 
 org.apache.avro:avro-maven-plugin:1.7.7:schema (execution: default, phase: 
 generate-sources)   pom.xml /flink-avro line 134Maven Project 
 Build Lifecycle Mapping problem
 {noformat}
 This can be fixed by disable plugin within Eclipse via pluginManagement ... 
 lifecyleMappingMetaData



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2560] Flink-Avro Plugin cannot be handl...

2015-08-23 Thread chiwanpark
Github user chiwanpark commented on the pull request:

https://github.com/apache/flink/pull/1041#issuecomment-133874756
  
Looks good to merge. I'll merge this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2557] TypeExtractor properly returns Mi...

2015-08-23 Thread zentol
GitHub user zentol opened a pull request:

https://github.com/apache/flink/pull/1045

[FLINK-2557] TypeExtractor properly returns MissingTypeInfo

This fix is not really obvious so let me explain:

getParameterTye() is called from two different places in the TypeExtractor; 
to validate the input type and to extract the output type.

Both cases consider the possibility that getParameterType() fails, but 
check for different exceptions. 

The TypeExtractor only returns a MissingTypeInfo if it encounters an 
InvalidTypesException; IllegalArgumentExceptions are not catched. This is what 
@mjsax encountered.
Changing the exception type causes the TypeExtractor to properly return a 
MissingTypeInfo, which is later overridden by the returns(...) call.

In order for the input validation to still work properly aswell, it now 
catches InvalidTypesExceptions instead.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/zentol/flink 2557_types

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/1045.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1045


commit 1c1dc459915c875ab0a4412aa3ef0a844f092171
Author: zentol s.mo...@web.de
Date:   2015-08-23T19:41:44Z

[FLINK-2557] TypeExtractor properly returns MissingTypeInfo




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2556) Fix/Refactor pre-flight Key validation

2015-08-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708412#comment-14708412
 ] 

ASF GitHub Bot commented on FLINK-2556:
---

GitHub user zentol opened a pull request:

https://github.com/apache/flink/pull/1044

[FLINK-2556] Refactor/Fix pre-flight Key validation

Removed redundant key validation in DistinctOperator
Keys constructors now make sure the type of every key is an instance of 
AtomicType/CompositeType, and that type.isKeyType() is true.
Additionally, the ExpressionKeys int[] constructor explicitly rejects Tuple0
Changes one test that actually tried something that shouldn't work in the 
first place.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/zentol/flink isKeyType_check

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/1044.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1044


commit 7a57b6ef2ecdc7adaf770f8585cf8f974c684705
Author: zentol s.mo...@web.de
Date:   2015-08-23T13:57:34Z

[FLINK-2556] Refactor/Fix pre-flight Key validation




 Fix/Refactor pre-flight Key validation
 --

 Key: FLINK-2556
 URL: https://issues.apache.org/jira/browse/FLINK-2556
 Project: Flink
  Issue Type: Bug
  Components: Java API
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler

 The pre-flight key validation checks are inconsistent, at times don't 
 actually check anything and in at least 1 case are done redundantly.
 For example,
 * you can group on a tuple containing a non-Atomic-/CompositeType using 
 String[] KeyExpressions (see FLINK-2541)
 * you can group on an AtomicType even though isKeyType() returns false, if it 
 is contained in a tuple
 * for distinct(String[]...) the above fails in the DistinctOperator 
 constructor, as it validates the key again for some reason.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2556] Refactor/Fix pre-flight Key valid...

2015-08-23 Thread zentol
GitHub user zentol opened a pull request:

https://github.com/apache/flink/pull/1044

[FLINK-2556] Refactor/Fix pre-flight Key validation

Removed redundant key validation in DistinctOperator
Keys constructors now make sure the type of every key is an instance of 
AtomicType/CompositeType, and that type.isKeyType() is true.
Additionally, the ExpressionKeys int[] constructor explicitly rejects Tuple0
Changes one test that actually tried something that shouldn't work in the 
first place.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/zentol/flink isKeyType_check

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/1044.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1044


commit 7a57b6ef2ecdc7adaf770f8585cf8f974c684705
Author: zentol s.mo...@web.de
Date:   2015-08-23T13:57:34Z

[FLINK-2556] Refactor/Fix pre-flight Key validation




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2557) Manual type information via returns fails in DataSet API

2015-08-23 Thread Chesnay Schepler (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708460#comment-14708460
 ] 

Chesnay Schepler commented on FLINK-2557:
-

I have a fix ready so that a MissingTypeInfo is returned, but am unsure about 
the error message. Should raw types just be added to the list of unsupported 
things, like Support for synthetic interfaces, lambdas, and generic or raw 
types is limited at this point?

 Manual type information via returns fails in DataSet API
 --

 Key: FLINK-2557
 URL: https://issues.apache.org/jira/browse/FLINK-2557
 Project: Flink
  Issue Type: Bug
  Components: Java API
Reporter: Matthias J. Sax
Assignee: Chesnay Schepler

 I changed the WordCount example as below and get an exception:
 Tokenizer is change to this (removed generics and added cast to String):
 {code:java}
 public static final class Tokenizer implements FlatMapFunction {
   public void flatMap(Object value, Collector out) {
   String[] tokens = ((String) value).toLowerCase().split(\\W+);
   for (String token : tokens) {
   if (token.length()  0) {
   out.collect(new Tuple2String, Integer(token, 
 1));
   }
   }
   }
 }
 {code}
 I added call to returns() here:
 {code:java}
 DataSetTuple2String, Integer counts =
   text.flatMap(new Tokenizer()).returns(Tuple2String,Integer)
   .groupBy(0).sum(1);
 {code}
 The exception is:
 {noformat}
 Exception in thread main java.lang.IllegalArgumentException: The types of 
 the interface org.apache.flink.api.common.functions.FlatMapFunction could not 
 be inferred. Support for synthetic interfaces, lambdas, and generic types is 
 limited at this point.
   at 
 org.apache.flink.api.java.typeutils.TypeExtractor.getParameterType(TypeExtractor.java:686)
   at 
 org.apache.flink.api.java.typeutils.TypeExtractor.getParameterTypeFromGenericType(TypeExtractor.java:710)
   at 
 org.apache.flink.api.java.typeutils.TypeExtractor.getParameterType(TypeExtractor.java:673)
   at 
 org.apache.flink.api.java.typeutils.TypeExtractor.privateCreateTypeInfo(TypeExtractor.java:365)
   at 
 org.apache.flink.api.java.typeutils.TypeExtractor.getUnaryOperatorReturnType(TypeExtractor.java:279)
   at 
 org.apache.flink.api.java.typeutils.TypeExtractor.getFlatMapReturnTypes(TypeExtractor.java:120)
   at org.apache.flink.api.java.DataSet.flatMap(DataSet.java:262)
   at 
 org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:69)
 {noformat}
 Fix:
 This should not immediately fail, but also only give a MissingTypeInfo so 
 that type hints would work.
 The error message is also wrong, btw: It should state that raw types are not 
 supported.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-2563) Gelly's Graph Algorithm Interface is limited

2015-08-23 Thread Andra Lungu (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andra Lungu updated FLINK-2563:
---
Summary: Gelly's Graph Algorithm Interface is limited  (was: Gelly's Graph 
Algorithm Interface is limites)

 Gelly's Graph Algorithm Interface is limited
 

 Key: FLINK-2563
 URL: https://issues.apache.org/jira/browse/FLINK-2563
 Project: Flink
  Issue Type: Improvement
  Components: Gelly
Affects Versions: 0.9, 0.10
Reporter: Andra Lungu

 Right now, Gelly's `GraphAlgorithm` interface only allows users/devs to 
 return the same type of Graph.
 public GraphK, VV, EV run(GraphK, VV, EV input) throws Exception;
 In numerous cases, one needs to return a single value, or a modified graph. 
 Off the top of my head, say one would like to implement a Triangle Count 
 library method. That takes as input a Graph and returns the total number of 
 triangles. 
 https://github.com/andralungu/gelly-partitioning/blob/master/src/main/java/example/GSATriangleCount.java
 With the current Gelly abstractions, something like this cannot be supported. 
 Also if I initially had a Graph of Long, Long, NullValue and my algorithm 
 changed the edge values to type Double, for instance, I would again have 
 created an implementation which is not supported. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (FLINK-2563) Gelly's Graph Algorithm Interface is limited

2015-08-23 Thread Vasia Kalavri (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vasia Kalavri reassigned FLINK-2563:


Assignee: Vasia Kalavri

 Gelly's Graph Algorithm Interface is limited
 

 Key: FLINK-2563
 URL: https://issues.apache.org/jira/browse/FLINK-2563
 Project: Flink
  Issue Type: Improvement
  Components: Gelly
Affects Versions: 0.9, 0.10
Reporter: Andra Lungu
Assignee: Vasia Kalavri

 Right now, Gelly's `GraphAlgorithm` interface only allows users/devs to 
 return the same type of Graph.
 public GraphK, VV, EV run(GraphK, VV, EV input) throws Exception;
 In numerous cases, one needs to return a single value, or a modified graph. 
 Off the top of my head, say one would like to implement a Triangle Count 
 library method. That takes as input a Graph and returns the total number of 
 triangles. 
 https://github.com/andralungu/gelly-partitioning/blob/master/src/main/java/example/GSATriangleCount.java
 With the current Gelly abstractions, something like this cannot be supported. 
 Also if I initially had a Graph of Long, Long, NullValue and my algorithm 
 changed the edge values to type Double, for instance, I would again have 
 created an implementation which is not supported. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2563) Gelly's Graph Algorithm Interface is limited

2015-08-23 Thread Andra Lungu (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708345#comment-14708345
 ] 

Andra Lungu commented on FLINK-2563:


It's all yours :)

 Gelly's Graph Algorithm Interface is limited
 

 Key: FLINK-2563
 URL: https://issues.apache.org/jira/browse/FLINK-2563
 Project: Flink
  Issue Type: Improvement
  Components: Gelly
Affects Versions: 0.9, 0.10
Reporter: Andra Lungu

 Right now, Gelly's `GraphAlgorithm` interface only allows users/devs to 
 return the same type of Graph.
 public GraphK, VV, EV run(GraphK, VV, EV input) throws Exception;
 In numerous cases, one needs to return a single value, or a modified graph. 
 Off the top of my head, say one would like to implement a Triangle Count 
 library method. That takes as input a Graph and returns the total number of 
 triangles. 
 https://github.com/andralungu/gelly-partitioning/blob/master/src/main/java/example/GSATriangleCount.java
 With the current Gelly abstractions, something like this cannot be supported. 
 Also if I initially had a Graph of Long, Long, NullValue and my algorithm 
 changed the edge values to type Double, for instance, I would again have 
 created an implementation which is not supported. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-2548) In a VertexCentricIteration, the run time of one iteration should be proportional to the size of the workset

2015-08-23 Thread Gabor Gevay (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabor Gevay resolved FLINK-2548.

Resolution: Won't Fix

 In a VertexCentricIteration, the run time of one iteration should be 
 proportional to the size of the workset
 

 Key: FLINK-2548
 URL: https://issues.apache.org/jira/browse/FLINK-2548
 Project: Flink
  Issue Type: Improvement
  Components: Gelly
Affects Versions: 0.9, 0.10
Reporter: Gabor Gevay
Assignee: Gabor Gevay

 Currently, the performance of vertex centric iteration is suboptimal in those 
 iterations where the workset is small, because the complexity of one 
 iteration contains the number of edges and vertices of the graph because of 
 coGroups:
 VertexCentricIteration.buildMessagingFunction does a coGroup between the 
 edges and the workset, to get the neighbors to the messaging UDF. This is 
 problematic from a performance point of view, because the coGroup UDF gets 
 called on all the edge groups, including those that are not getting any 
 messages.
 An analogous problem is present in 
 VertexCentricIteration.createResultSimpleVertex at the creation of the 
 updates: a coGroup happens between the messages and the solution set, which 
 has the number of vertices of the graph included in its complexity.
 Both of these coGroups could be avoided by doing a join instead (with the 
 same keys that the coGroup uses), and then a groupBy. The complexity of these 
 operations would be dominated by the size of the workset, as opposed to the 
 number of edges or vertices of the graph. The joins should have the edges and 
 the solution set at the build side to achieve this complexity. (They will not 
 be rebuilt at every iteration.)
 I made some experiments with this, and the initial results seem promising. On 
 some workloads, this achieves a 2 times speedup, because later iterations 
 often have quite small worksets, and these get a huge speedup from this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-2541) TypeComparator creation fails for T2T1byte[], byte[]

2015-08-23 Thread Chesnay Schepler (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler resolved FLINK-2541.
-
Resolution: Invalid

 TypeComparator creation fails for T2T1byte[], byte[]
 

 Key: FLINK-2541
 URL: https://issues.apache.org/jira/browse/FLINK-2541
 Project: Flink
  Issue Type: Bug
  Components: Java API
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler

 When running the following job as a JavaProgramTest:
 {code}
 ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
 DataSetTuple2Tuple1byte[], byte[] data = env.fromElements(
   new Tuple2Tuple1byte[], byte[](
   new Tuple1byte[](new byte[]{1, 2}), 
   new byte[]{1, 2, 3}),
   new Tuple2Tuple1byte[], byte[](
   new Tuple1byte[](new byte[]{1, 2}), 
   new byte[]{1, 2, 3}));
 data.groupBy(f0.f0)
 .reduceGroup(new DummyReduceTuple2Tuple1byte[], byte[]())
 .print();
 {code}
 with DummyReduce defined as
 {code}
 public static class DummyReduceIN implements GroupReduceFunctionIN, IN {
 @Override
 public void reduce(IterableIN values, CollectorIN out) throws Exception {
   for (IN value : values) {
   out.collect(value);
   }}}
 {code}
 i encountered the following exception:
 Tuple comparator creation has a bug
 java.lang.IllegalArgumentException: Tuple comparator creation has a bug
   at 
 org.apache.flink.api.java.typeutils.TupleTypeInfo.getNewComparator(TupleTypeInfo.java:131)
   at 
 org.apache.flink.api.common.typeutils.CompositeType.createComparator(CompositeType.java:133)
   at 
 org.apache.flink.api.common.typeutils.CompositeType.createComparator(CompositeType.java:122)
   at 
 org.apache.flink.api.common.operators.base.GroupReduceOperatorBase.getTypeComparator(GroupReduceOperatorBase.java:155)
   at 
 org.apache.flink.api.common.operators.base.GroupReduceOperatorBase.executeOnCollections(GroupReduceOperatorBase.java:184)
   at 
 org.apache.flink.api.common.operators.CollectionExecutor.executeUnaryOperator(CollectionExecutor.java:236)
   at 
 org.apache.flink.api.common.operators.CollectionExecutor.execute(CollectionExecutor.java:143)
   at 
 org.apache.flink.api.common.operators.CollectionExecutor.executeUnaryOperator(CollectionExecutor.java:215)
   at 
 org.apache.flink.api.common.operators.CollectionExecutor.execute(CollectionExecutor.java:143)
   at 
 org.apache.flink.api.common.operators.CollectionExecutor.execute(CollectionExecutor.java:125)
   at 
 org.apache.flink.api.common.operators.CollectionExecutor.executeDataSink(CollectionExecutor.java:176)
   at 
 org.apache.flink.api.common.operators.CollectionExecutor.execute(CollectionExecutor.java:152)
   at 
 org.apache.flink.api.common.operators.CollectionExecutor.execute(CollectionExecutor.java:125)
   at 
 org.apache.flink.api.common.operators.CollectionExecutor.execute(CollectionExecutor.java:109)
   at 
 org.apache.flink.api.java.CollectionEnvironment.execute(CollectionEnvironment.java:33)
   at 
 org.apache.flink.test.util.CollectionTestEnvironment.execute(CollectionTestEnvironment.java:35)
   at 
 org.apache.flink.test.util.CollectionTestEnvironment.execute(CollectionTestEnvironment.java:30)
   at org.apache.flink.api.java.DataSet.collect(DataSet.java:408)
   at org.apache.flink.api.java.DataSet.print(DataSet.java:1349)
   at 
 org.apache.flink.languagebinding.api.java.python.AbstractPythonTest.testProgram(AbstractPythonTest.java:42)
   at 
 org.apache.flink.test.util.JavaProgramTestBase.testJobCollectionExecution(JavaProgramTestBase.java:226)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at 
 org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
   at 
 org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
   at 
 org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
   at 
 org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
   at 
 org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
   at 
 org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
   at 

[jira] [Created] (FLINK-2565) Support primitive arrays as keys

2015-08-23 Thread Chesnay Schepler (JIRA)
Chesnay Schepler created FLINK-2565:
---

 Summary: Support primitive arrays as keys
 Key: FLINK-2565
 URL: https://issues.apache.org/jira/browse/FLINK-2565
 Project: Flink
  Issue Type: Improvement
  Components: Java API
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2541) TypeComparator creation fails for T2T1byte[], byte[]

2015-08-23 Thread Chesnay Schepler (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708379#comment-14708379
 ] 

Chesnay Schepler commented on FLINK-2541:
-

since this is not actually an issue of the TypeComparator i will close this 
ticket, and open a new one to support primitive arrays as keys. The validation 
checks will be extended in FLINK-2556.

 TypeComparator creation fails for T2T1byte[], byte[]
 

 Key: FLINK-2541
 URL: https://issues.apache.org/jira/browse/FLINK-2541
 Project: Flink
  Issue Type: Bug
  Components: Java API
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler

 When running the following job as a JavaProgramTest:
 {code}
 ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
 DataSetTuple2Tuple1byte[], byte[] data = env.fromElements(
   new Tuple2Tuple1byte[], byte[](
   new Tuple1byte[](new byte[]{1, 2}), 
   new byte[]{1, 2, 3}),
   new Tuple2Tuple1byte[], byte[](
   new Tuple1byte[](new byte[]{1, 2}), 
   new byte[]{1, 2, 3}));
 data.groupBy(f0.f0)
 .reduceGroup(new DummyReduceTuple2Tuple1byte[], byte[]())
 .print();
 {code}
 with DummyReduce defined as
 {code}
 public static class DummyReduceIN implements GroupReduceFunctionIN, IN {
 @Override
 public void reduce(IterableIN values, CollectorIN out) throws Exception {
   for (IN value : values) {
   out.collect(value);
   }}}
 {code}
 i encountered the following exception:
 Tuple comparator creation has a bug
 java.lang.IllegalArgumentException: Tuple comparator creation has a bug
   at 
 org.apache.flink.api.java.typeutils.TupleTypeInfo.getNewComparator(TupleTypeInfo.java:131)
   at 
 org.apache.flink.api.common.typeutils.CompositeType.createComparator(CompositeType.java:133)
   at 
 org.apache.flink.api.common.typeutils.CompositeType.createComparator(CompositeType.java:122)
   at 
 org.apache.flink.api.common.operators.base.GroupReduceOperatorBase.getTypeComparator(GroupReduceOperatorBase.java:155)
   at 
 org.apache.flink.api.common.operators.base.GroupReduceOperatorBase.executeOnCollections(GroupReduceOperatorBase.java:184)
   at 
 org.apache.flink.api.common.operators.CollectionExecutor.executeUnaryOperator(CollectionExecutor.java:236)
   at 
 org.apache.flink.api.common.operators.CollectionExecutor.execute(CollectionExecutor.java:143)
   at 
 org.apache.flink.api.common.operators.CollectionExecutor.executeUnaryOperator(CollectionExecutor.java:215)
   at 
 org.apache.flink.api.common.operators.CollectionExecutor.execute(CollectionExecutor.java:143)
   at 
 org.apache.flink.api.common.operators.CollectionExecutor.execute(CollectionExecutor.java:125)
   at 
 org.apache.flink.api.common.operators.CollectionExecutor.executeDataSink(CollectionExecutor.java:176)
   at 
 org.apache.flink.api.common.operators.CollectionExecutor.execute(CollectionExecutor.java:152)
   at 
 org.apache.flink.api.common.operators.CollectionExecutor.execute(CollectionExecutor.java:125)
   at 
 org.apache.flink.api.common.operators.CollectionExecutor.execute(CollectionExecutor.java:109)
   at 
 org.apache.flink.api.java.CollectionEnvironment.execute(CollectionEnvironment.java:33)
   at 
 org.apache.flink.test.util.CollectionTestEnvironment.execute(CollectionTestEnvironment.java:35)
   at 
 org.apache.flink.test.util.CollectionTestEnvironment.execute(CollectionTestEnvironment.java:30)
   at org.apache.flink.api.java.DataSet.collect(DataSet.java:408)
   at org.apache.flink.api.java.DataSet.print(DataSet.java:1349)
   at 
 org.apache.flink.languagebinding.api.java.python.AbstractPythonTest.testProgram(AbstractPythonTest.java:42)
   at 
 org.apache.flink.test.util.JavaProgramTestBase.testJobCollectionExecution(JavaProgramTestBase.java:226)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at 
 org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
   at 
 org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
   at 
 org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
   at 
 org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
   at 
 org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
   at 
 

[jira] [Created] (FLINK-2564) Failing Test: RandomSamplerTest

2015-08-23 Thread Matthias J. Sax (JIRA)
Matthias J. Sax created FLINK-2564:
--

 Summary: Failing Test: RandomSamplerTest
 Key: FLINK-2564
 URL: https://issues.apache.org/jira/browse/FLINK-2564
 Project: Flink
  Issue Type: Bug
Reporter: Matthias J. Sax


{noformat}
Tests run: 17, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 15.943 sec  
FAILURE! - in org.apache.flink.api.java.sampling.   
testPoissonSamplerFraction(org.apache.flink.api.java.sampling.RandomSamplerTest)
 Time elapsed: 0.017 sec  FAILURE!
java.lang.AssertionError: expected fraction: 0.01, result fraction: 0.011300
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.assertTrue(Assert.java:41)
at 
org.apache.flink.api.java.sampling.RandomSamplerTest.verifySamplerFraction(RandomSamplerTest.java:249)
at 
org.apache.flink.api.java.sampling.RandomSamplerTest.testPoissonSamplerFraction(RandomSamplerTest.java:116)

Results :
Failed tests:
Successfully installed excon-0.33.0
RandomSamplerTest.testPoissonSamplerFraction:116-verifySamplerFraction:249 
expected fraction: 0.01, result fraction: 0.011300
{noformat}

Full log: https://travis-ci.org/apache/flink/jobs/76720572



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2563] [gelly] changed Graph's run() to ...

2015-08-23 Thread vasia
GitHub user vasia opened a pull request:

https://github.com/apache/flink/pull/1042

[FLINK-2563] [gelly] changed Graph's run() to return an arbitrary result 
type

Added a type parameter to the 'GraphAlgorithm' interface to allow 
implementing library methods that return single values, Graphs of different 
types, etc.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vasia/flink graphAlgorithm

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/1042.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1042


commit ff7240b9fd1a899c938108604875ea16023e7a78
Author: vasia va...@apache.org
Date:   2015-08-23T11:06:15Z

[FLINK-2563] [gelly] extended the run() method of GraphAlgorithm interface 
to return an arbitrary type




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2563) Gelly's Graph Algorithm Interface is limited

2015-08-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708391#comment-14708391
 ] 

ASF GitHub Bot commented on FLINK-2563:
---

GitHub user vasia opened a pull request:

https://github.com/apache/flink/pull/1042

[FLINK-2563] [gelly] changed Graph's run() to return an arbitrary result 
type

Added a type parameter to the 'GraphAlgorithm' interface to allow 
implementing library methods that return single values, Graphs of different 
types, etc.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vasia/flink graphAlgorithm

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/1042.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1042


commit ff7240b9fd1a899c938108604875ea16023e7a78
Author: vasia va...@apache.org
Date:   2015-08-23T11:06:15Z

[FLINK-2563] [gelly] extended the run() method of GraphAlgorithm interface 
to return an arbitrary type




 Gelly's Graph Algorithm Interface is limited
 

 Key: FLINK-2563
 URL: https://issues.apache.org/jira/browse/FLINK-2563
 Project: Flink
  Issue Type: Improvement
  Components: Gelly
Affects Versions: 0.9, 0.10
Reporter: Andra Lungu
Assignee: Vasia Kalavri

 Right now, Gelly's `GraphAlgorithm` interface only allows users/devs to 
 return the same type of Graph.
 public GraphK, VV, EV run(GraphK, VV, EV input) throws Exception;
 In numerous cases, one needs to return a single value, or a modified graph. 
 Off the top of my head, say one would like to implement a Triangle Count 
 library method. That takes as input a Graph and returns the total number of 
 triangles. 
 https://github.com/andralungu/gelly-partitioning/blob/master/src/main/java/example/GSATriangleCount.java
 With the current Gelly abstractions, something like this cannot be supported. 
 Also if I initially had a Graph of Long, Long, NullValue and my algorithm 
 changed the edge values to type Double, for instance, I would again have 
 created an implementation which is not supported. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2548) In a VertexCentricIteration, the run time of one iteration should be proportional to the size of the workset

2015-08-23 Thread Gabor Gevay (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708364#comment-14708364
 ] 

Gabor Gevay commented on FLINK-2548:


OK, you are probably right.

I ran some more tests, and it seems that the issue in my use case is more with 
the serialization. In other cases, when the serialization of the vertex IDs is 
cheaper, then the coGroup implementation does OK with respect to the run time 
of one iteration following the workset size.

 In a VertexCentricIteration, the run time of one iteration should be 
 proportional to the size of the workset
 

 Key: FLINK-2548
 URL: https://issues.apache.org/jira/browse/FLINK-2548
 Project: Flink
  Issue Type: Improvement
  Components: Gelly
Affects Versions: 0.9, 0.10
Reporter: Gabor Gevay
Assignee: Gabor Gevay

 Currently, the performance of vertex centric iteration is suboptimal in those 
 iterations where the workset is small, because the complexity of one 
 iteration contains the number of edges and vertices of the graph because of 
 coGroups:
 VertexCentricIteration.buildMessagingFunction does a coGroup between the 
 edges and the workset, to get the neighbors to the messaging UDF. This is 
 problematic from a performance point of view, because the coGroup UDF gets 
 called on all the edge groups, including those that are not getting any 
 messages.
 An analogous problem is present in 
 VertexCentricIteration.createResultSimpleVertex at the creation of the 
 updates: a coGroup happens between the messages and the solution set, which 
 has the number of vertices of the graph included in its complexity.
 Both of these coGroups could be avoided by doing a join instead (with the 
 same keys that the coGroup uses), and then a groupBy. The complexity of these 
 operations would be dominated by the size of the workset, as opposed to the 
 number of edges or vertices of the graph. The joins should have the edges and 
 the solution set at the build side to achieve this complexity. (They will not 
 be rebuilt at every iteration.)
 I made some experiments with this, and the initial results seem promising. On 
 some workloads, this achieves a 2 times speedup, because later iterations 
 often have quite small worksets, and these get a huge speedup from this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-2566) FlinkTopologyContext no populated completely

2015-08-23 Thread Matthias J. Sax (JIRA)
Matthias J. Sax created FLINK-2566:
--

 Summary: FlinkTopologyContext no populated completely
 Key: FLINK-2566
 URL: https://issues.apache.org/jira/browse/FLINK-2566
 Project: Flink
  Issue Type: Improvement
  Components: flink-contrib
Reporter: Matthias J. Sax
Assignee: Matthias J. Sax
Priority: Minor


Currently FlinkTopologyContext is not populated completely. It only contains 
enough information to make WordCount example work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2563) Gelly's Graph Algorithm Interface is limited

2015-08-23 Thread Vasia Kalavri (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708342#comment-14708342
 ] 

Vasia Kalavri commented on FLINK-2563:
--

That's a big limitation, I agree. Mind if I work on this [~andralungu]?

 Gelly's Graph Algorithm Interface is limited
 

 Key: FLINK-2563
 URL: https://issues.apache.org/jira/browse/FLINK-2563
 Project: Flink
  Issue Type: Improvement
  Components: Gelly
Affects Versions: 0.9, 0.10
Reporter: Andra Lungu

 Right now, Gelly's `GraphAlgorithm` interface only allows users/devs to 
 return the same type of Graph.
 public GraphK, VV, EV run(GraphK, VV, EV input) throws Exception;
 In numerous cases, one needs to return a single value, or a modified graph. 
 Off the top of my head, say one would like to implement a Triangle Count 
 library method. That takes as input a Graph and returns the total number of 
 triangles. 
 https://github.com/andralungu/gelly-partitioning/blob/master/src/main/java/example/GSATriangleCount.java
 With the current Gelly abstractions, something like this cannot be supported. 
 Also if I initially had a Graph of Long, Long, NullValue and my algorithm 
 changed the edge values to type Double, for instance, I would again have 
 created an implementation which is not supported. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1901) Create sample operator for Dataset

2015-08-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708658#comment-14708658
 ] 

ASF GitHub Bot commented on FLINK-1901:
---

Github user ChengXiangLi commented on the pull request:

https://github.com/apache/flink/pull/949#issuecomment-133990256
  
Hi, @sachingoel0101 , while sample with fraction, it's not easy to verify 
whether the DataSet is sampled with input fraction. In the test, i take 5 times 
sample, use the average size to computer the result fraction, and then compare 
the result fraction with input fraction, verify their difference is not more 
than 10% percent. The following case may happens as well, Sampler sample the 
DataSet with input fraction, but the sampled result size is too small or too 
large that beyond our verification condition, it happens, just with very little 
possibility, say less than 0.001 in this test. it should be ok if this failure 
happens very occasionally, please let me know if you found it's not.


 Create sample operator for Dataset
 --

 Key: FLINK-1901
 URL: https://issues.apache.org/jira/browse/FLINK-1901
 Project: Flink
  Issue Type: Improvement
  Components: Core
Reporter: Theodore Vasiloudis
Assignee: Chengxiang Li

 In order to be able to implement Stochastic Gradient Descent and a number of 
 other machine learning algorithms we need to have a way to take a random 
 sample from a Dataset.
 We need to be able to sample with or without replacement from the Dataset, 
 choose the relative or exact size of the sample, set a seed for 
 reproducibility, and support sampling within iterations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (FLINK-2373) Add configuration parameter to createRemoteEnvironment method

2015-08-23 Thread Andreas Kunft (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708693#comment-14708693
 ] 

Andreas Kunft edited comment on FLINK-2373 at 8/24/15 2:39 AM:
---

Hey, i was just going to open a PR which had basically also another method for 
the remote environment with the extra configuration parameter, just like yours. 
So I guess, the PR is now obsolete (you can see my changes here: 
https://github.com/akunft/flink/commit/60240632ed71c072ecf880a586f12fd966412d67).
As far as I see it, there is a difference in the configuration provided for the 
remote environment and the local one, as the remote config is only used for 
def. parallelism and the akka config for the job client and not for 
configuration of the cluster itself. The config for the local execution covers 
all the configuration.

I think it should be stated clearly in the java doc, that the config is only 
for the jobclient and def. parallelism in case of the remote environment. 


was (Author: akunft):
Hey, i was just going to open a PR which had basically also another method for 
the remote environment with the extra configuration parameter, just like yours. 
So I guess, the PR is now obsolete.
As far as I see it, there is a difference in the configuration provided for the 
remote environment and the local one, as the remote config is only used for 
def. parallelism and the akka config for the job client and not for 
configuration of the cluster itself. The config for the local execution covers 
all the configuration.

I think it should be stated clearly in the java doc, that the config is only 
for the jobclient and def. parallelism in case of the remote environment. 

 Add configuration parameter to createRemoteEnvironment method
 -

 Key: FLINK-2373
 URL: https://issues.apache.org/jira/browse/FLINK-2373
 Project: Flink
  Issue Type: Bug
  Components: other
Reporter: Andreas Kunft
Priority: Minor
   Original Estimate: 24h
  Remaining Estimate: 24h

 Currently there is no way to provide a custom configuration upon creation of 
 a remote environment (via ExecutionEnvironment.createRemoteEnvironment(...)).
 This leads to errors when the submitted job exceeds the default value for the 
 max. payload size in Akka, as we can not increase the configuration value 
 (akka.remote.OversizedPayloadException: Discarding oversized payload...)
 Providing an overloaded method with a configuration parameter for the remote 
 environment fixes that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2373) Add configuration parameter to createRemoteEnvironment method

2015-08-23 Thread Andreas Kunft (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708693#comment-14708693
 ] 

Andreas Kunft commented on FLINK-2373:
--

Hey, i was just going to open a PR which had basically also another method for 
the remote environment with the extra configuration parameter, just like yours. 
So I guess, the PR is now obsolete.
As far as I see it, there is a difference for the configuration provided for 
the remote environment and the local one, as the remote config is only used for 
def. parallelism and the akka config for the job client and not for 
configuration of the cluster itself, as the config for the local execution 
covers all the configuration.

I think it should be stated clearly in the java doc, that the config is only 
for the jobclient and def. parallelism in case of the remote environment. 

 Add configuration parameter to createRemoteEnvironment method
 -

 Key: FLINK-2373
 URL: https://issues.apache.org/jira/browse/FLINK-2373
 Project: Flink
  Issue Type: Bug
  Components: other
Reporter: Andreas Kunft
Priority: Minor
   Original Estimate: 24h
  Remaining Estimate: 24h

 Currently there is no way to provide a custom configuration upon creation of 
 a remote environment (via ExecutionEnvironment.createRemoteEnvironment(...)).
 This leads to errors when the submitted job exceeds the default value for the 
 max. payload size in Akka, as we can not increase the configuration value 
 (akka.remote.OversizedPayloadException: Discarding oversized payload...)
 Providing an overloaded method with a configuration parameter for the remote 
 environment fixes that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (FLINK-2373) Add configuration parameter to createRemoteEnvironment method

2015-08-23 Thread Andreas Kunft (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708693#comment-14708693
 ] 

Andreas Kunft edited comment on FLINK-2373 at 8/24/15 2:29 AM:
---

Hey, i was just going to open a PR which had basically also another method for 
the remote environment with the extra configuration parameter, just like yours. 
So I guess, the PR is now obsolete.
As far as I see it, there is a difference in the configuration provided for the 
remote environment and the local one, as the remote config is only used for 
def. parallelism and the akka config for the job client and not for 
configuration of the cluster itself. The config for the local execution covers 
all the configuration.

I think it should be stated clearly in the java doc, that the config is only 
for the jobclient and def. parallelism in case of the remote environment. 


was (Author: akunft):
Hey, i was just going to open a PR which had basically also another method for 
the remote environment with the extra configuration parameter, just like yours. 
So I guess, the PR is now obsolete.
As far as I see it, there is a difference for the configuration provided for 
the remote environment and the local one, as the remote config is only used for 
def. parallelism and the akka config for the job client and not for 
configuration of the cluster itself, as the config for the local execution 
covers all the configuration.

I think it should be stated clearly in the java doc, that the config is only 
for the jobclient and def. parallelism in case of the remote environment. 

 Add configuration parameter to createRemoteEnvironment method
 -

 Key: FLINK-2373
 URL: https://issues.apache.org/jira/browse/FLINK-2373
 Project: Flink
  Issue Type: Bug
  Components: other
Reporter: Andreas Kunft
Priority: Minor
   Original Estimate: 24h
  Remaining Estimate: 24h

 Currently there is no way to provide a custom configuration upon creation of 
 a remote environment (via ExecutionEnvironment.createRemoteEnvironment(...)).
 This leads to errors when the submitted job exceeds the default value for the 
 max. payload size in Akka, as we can not increase the configuration value 
 (akka.remote.OversizedPayloadException: Discarding oversized payload...)
 Providing an overloaded method with a configuration parameter for the remote 
 environment fixes that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1901) Create sample operator for Dataset

2015-08-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708701#comment-14708701
 ] 

ASF GitHub Bot commented on FLINK-1901:
---

Github user chiwanpark commented on the pull request:

https://github.com/apache/flink/pull/949#issuecomment-133999705
  
@ChengXiangLi I know that it is hard to verify random sampler 
implementation. But we need to fix this test failing because of difficulty of 
other pull requests verification. Some tests of other pull requests are failed 
by K-S test and sampling test with fraction. There is a [JIRA 
issue](https://issues.apache.org/jira/browse/FLINK-2564) covered this.

I'm testing with increased count of samples and source size. If I get a 
notable result, I'll post the result.


 Create sample operator for Dataset
 --

 Key: FLINK-1901
 URL: https://issues.apache.org/jira/browse/FLINK-1901
 Project: Flink
  Issue Type: Improvement
  Components: Core
Reporter: Theodore Vasiloudis
Assignee: Chengxiang Li

 In order to be able to implement Stochastic Gradient Descent and a number of 
 other machine learning algorithms we need to have a way to take a random 
 sample from a Dataset.
 We need to be able to sample with or without replacement from the Dataset, 
 choose the relative or exact size of the sample, set a seed for 
 reproducibility, and support sampling within iterations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1901] [core] Create sample operator for...

2015-08-23 Thread chiwanpark
Github user chiwanpark commented on the pull request:

https://github.com/apache/flink/pull/949#issuecomment-133999705
  
@ChengXiangLi I know that it is hard to verify random sampler 
implementation. But we need to fix this test failing because of difficulty of 
other pull requests verification. Some tests of other pull requests are failed 
by K-S test and sampling test with fraction. There is a [JIRA 
issue](https://issues.apache.org/jira/browse/FLINK-2564) covered this.

I'm testing with increased count of samples and source size. If I get a 
notable result, I'll post the result.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1901] [core] Create sample operator for...

2015-08-23 Thread ChengXiangLi
Github user ChengXiangLi commented on the pull request:

https://github.com/apache/flink/pull/949#issuecomment-133990256
  
Hi, @sachingoel0101 , while sample with fraction, it's not easy to verify 
whether the DataSet is sampled with input fraction. In the test, i take 5 times 
sample, use the average size to computer the result fraction, and then compare 
the result fraction with input fraction, verify their difference is not more 
than 10% percent. The following case may happens as well, Sampler sample the 
DataSet with input fraction, but the sampled result size is too small or too 
large that beyond our verification condition, it happens, just with very little 
possibility, say less than 0.001 in this test. it should be ok if this failure 
happens very occasionally, please let me know if you found it's not.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---