[jira] [Created] (FLINK-3947) Provide low level access to RocksDB state backend

2016-05-20 Thread Elias Levy (JIRA)
Elias Levy created FLINK-3947:
-

 Summary: Provide low level access to RocksDB state backend
 Key: FLINK-3947
 URL: https://issues.apache.org/jira/browse/FLINK-3947
 Project: Flink
  Issue Type: Improvement
  Components: state backends
Affects Versions: 1.0.3
Reporter: Elias Levy


The current state API is limiting and some implementations are not as efficient 
as they could be, particularly when working with large states. For instance, a 
ListState is append only.  You cannot remove values from the list.  And the 
RocksDBListState get() implementation reads all list values from RocksDB 
instead of returning an Iterable that only reads values as needed.

Furthermore, RocksDB is an ordered KV store, yet there is no ordered map state 
API with an ability to iterate over the stored values in order.






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3922) Infinite recursion on TypeExtractor

2016-05-20 Thread Flavio Pompermaier (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15294193#comment-15294193
 ] 

Flavio Pompermaier commented on FLINK-3922:
---

I confirm that this patch fixed my test!
However I'd add a test to be sure to not reintroduce this problem in the future
Thanks Timo

> Infinite recursion on TypeExtractor
> ---
>
> Key: FLINK-3922
> URL: https://issues.apache.org/jira/browse/FLINK-3922
> Project: Flink
>  Issue Type: Bug
>  Components: Type Serialization System
>Affects Versions: 1.0.2
>Reporter: Flavio Pompermaier
>Assignee: Timo Walther
>Priority: Critical
>
> This program cause a StackOverflow (infinite recursion) in the TypeExtractor:
> {code:title=TypeSerializerStackOverflowOnRecursivePojo.java|borderStyle=solid}
> public class TypeSerializerStackOverflowOnRecursivePojo {
>   public static class RecursivePojo implements Serializable {
>   private static final long serialVersionUID = 1L;
>   
>   private RecursivePojo parent;
>   public RecursivePojo(){}
>   public RecursivePojo(K k, V v) {
>   }
>   public RecursivePojo getParent() {
>   return parent;
>   }
>   public void setParent(RecursivePojo parent) {
>   this.parent = parent;
>   }
>   
>   }
>   public static class TypedTuple extends Tuple3 RecursivePojo>>{
>   private static final long serialVersionUID = 1L;
>   }
>   
>   public static void main(String[] args) throws Exception {
>   ExecutionEnvironment env = 
> ExecutionEnvironment.getExecutionEnvironment();
>   env.fromCollection(Arrays.asList(new RecursivePojo Map>("test",new HashMap(
>   .map(t-> {TypedTuple ret = new TypedTuple();ret.setFields("1", 
> "1", t);return ret;}).returns(TypedTuple.class)
>   .print();
>   }
>   
> }
> {code}
> The thrown Exception is the following:
> {code:title=Exception thrown}
> Exception in thread "main" java.lang.StackOverflowError
>   at 
> sun.reflect.generics.parser.SignatureParser.parsePackageNameAndSimpleClassTypeSignature(SignatureParser.java:328)
>   at 
> sun.reflect.generics.parser.SignatureParser.parseClassTypeSignature(SignatureParser.java:310)
>   at 
> sun.reflect.generics.parser.SignatureParser.parseFieldTypeSignature(SignatureParser.java:289)
>   at 
> sun.reflect.generics.parser.SignatureParser.parseFieldTypeSignature(SignatureParser.java:283)
>   at 
> sun.reflect.generics.parser.SignatureParser.parseTypeSignature(SignatureParser.java:485)
>   at 
> sun.reflect.generics.parser.SignatureParser.parseReturnType(SignatureParser.java:627)
>   at 
> sun.reflect.generics.parser.SignatureParser.parseMethodTypeSignature(SignatureParser.java:577)
>   at 
> sun.reflect.generics.parser.SignatureParser.parseMethodSig(SignatureParser.java:171)
>   at 
> sun.reflect.generics.repository.ConstructorRepository.parse(ConstructorRepository.java:55)
>   at 
> sun.reflect.generics.repository.ConstructorRepository.parse(ConstructorRepository.java:43)
>   at 
> sun.reflect.generics.repository.AbstractRepository.(AbstractRepository.java:74)
>   at 
> sun.reflect.generics.repository.GenericDeclRepository.(GenericDeclRepository.java:49)
>   at 
> sun.reflect.generics.repository.ConstructorRepository.(ConstructorRepository.java:51)
>   at 
> sun.reflect.generics.repository.MethodRepository.(MethodRepository.java:46)
>   at 
> sun.reflect.generics.repository.MethodRepository.make(MethodRepository.java:59)
>   at java.lang.reflect.Method.getGenericInfo(Method.java:102)
>   at java.lang.reflect.Method.getGenericReturnType(Method.java:255)
>   at 
> org.apache.flink.api.java.typeutils.TypeExtractor.isValidPojoField(TypeExtractor.java:1610)
>   at 
> org.apache.flink.api.java.typeutils.TypeExtractor.analyzePojo(TypeExtractor.java:1671)
>   at 
> org.apache.flink.api.java.typeutils.TypeExtractor.privateGetForClass(TypeExtractor.java:1559)
>   at 
> org.apache.flink.api.java.typeutils.TypeExtractor.createTypeInfoWithTypeHierarchy(TypeExtractor.java:732)
>   at 
> org.apache.flink.api.java.typeutils.TypeExtractor.analyzePojo(TypeExtractor.java:1678)
>   at 
> org.apache.flink.api.java.typeutils.TypeExtractor.privateGetForClass(TypeExtractor.java:1559)
>   at 
> org.apache.flink.api.java.typeutils.TypeExtractor.createTypeInfoWithTypeHierarchy(TypeExtractor.java:732)
>   at 
> org.apache.flink.api.java.typeutils.TypeExtractor.analyzePojo(TypeExtractor.java:1678)
>   at 
> 

[jira] [Created] (FLINK-3946) State API Should Support Data Expiration

2016-05-20 Thread Elias Levy (JIRA)
Elias Levy created FLINK-3946:
-

 Summary: State API Should Support Data Expiration
 Key: FLINK-3946
 URL: https://issues.apache.org/jira/browse/FLINK-3946
 Project: Flink
  Issue Type: Improvement
  Components: state backends
Affects Versions: 1.0.3
Reporter: Elias Levy


The state API should support data expiration.  Consider a custom data stream 
operator that operates on a keyed stream and maintains a cache of items in its 
keyed state.  The operator can expire items in the state based on event time or 
 processing time.  But as the state is keyed, if the operator is never invoked 
again for a previously observed key, the state associated with that key will 
never be expired, leading the overall job state to grow indefinitely.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-3928) Potential overflow due to 32-bit int arithmetic

2016-05-20 Thread Greg Hogan (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Hogan closed FLINK-3928.
-
Resolution: Fixed

Fixed in 723766993ce659f7316c890fcce7d0633957e448

> Potential overflow due to 32-bit int arithmetic
> ---
>
> Key: FLINK-3928
> URL: https://issues.apache.org/jira/browse/FLINK-3928
> Project: Flink
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Greg Hogan
>Priority: Minor
> Fix For: 1.1.0
>
>
> The following pattern occurs in both LocalClusteringCoefficient.java and 
> TriangleListing.java :
> {code}
> int scale = parameters.getInt("scale", DEFAULT_SCALE);
> ...
> long vertexCount = 1 << scale;
> {code}
> "1 << scale" may overflow with type "int" due to the use of 32-bit arithmetic



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-3780) Jaccard Similarity

2016-05-20 Thread Greg Hogan (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Hogan closed FLINK-3780.
-
Resolution: Fixed

Implemented in 8ed368582a8550ccb5941dfd4e8412dad8ca34c0

> Jaccard Similarity
> --
>
> Key: FLINK-3780
> URL: https://issues.apache.org/jira/browse/FLINK-3780
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Affects Versions: 1.1.0
>Reporter: Greg Hogan
>Assignee: Greg Hogan
> Fix For: 1.1.0
>
>
> Implement a Jaccard Similarity algorithm computing all non-zero similarity 
> scores. This algorithm is similar to {{TriangleListing}} but instead of 
> joining two-paths against an edge list we count two-paths.
> {{flink-gelly-examples}} currently has {{JaccardSimilarityMeasure}} which 
> relies on {{Graph.getTriplets()}} so only computes similarity scores for 
> neighbors but not neighbors-of-neighbors.
> This algorithm is easily modified for other similarity scores such as 
> Adamic-Adar similarity where the sum of endpoint degrees is replaced by the 
> degree of the middle vertex.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3928) Potential overflow due to 32-bit int arithmetic

2016-05-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15294118#comment-15294118
 ] 

ASF GitHub Bot commented on FLINK-3928:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/2006


> Potential overflow due to 32-bit int arithmetic
> ---
>
> Key: FLINK-3928
> URL: https://issues.apache.org/jira/browse/FLINK-3928
> Project: Flink
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Greg Hogan
>Priority: Minor
> Fix For: 1.1.0
>
>
> The following pattern occurs in both LocalClusteringCoefficient.java and 
> TriangleListing.java :
> {code}
> int scale = parameters.getInt("scale", DEFAULT_SCALE);
> ...
> long vertexCount = 1 << scale;
> {code}
> "1 << scale" may overflow with type "int" due to the use of 32-bit arithmetic



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3780) Jaccard Similarity

2016-05-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15294119#comment-15294119
 ] 

ASF GitHub Bot commented on FLINK-3780:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/1980


> Jaccard Similarity
> --
>
> Key: FLINK-3780
> URL: https://issues.apache.org/jira/browse/FLINK-3780
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Affects Versions: 1.1.0
>Reporter: Greg Hogan
>Assignee: Greg Hogan
> Fix For: 1.1.0
>
>
> Implement a Jaccard Similarity algorithm computing all non-zero similarity 
> scores. This algorithm is similar to {{TriangleListing}} but instead of 
> joining two-paths against an edge list we count two-paths.
> {{flink-gelly-examples}} currently has {{JaccardSimilarityMeasure}} which 
> relies on {{Graph.getTriplets()}} so only computes similarity scores for 
> neighbors but not neighbors-of-neighbors.
> This algorithm is easily modified for other similarity scores such as 
> Adamic-Adar similarity where the sum of endpoint degrees is replaced by the 
> degree of the middle vertex.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3928] [gelly] Potential overflow due to...

2016-05-20 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/2006


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-3780] [gelly] Jaccard Similarity

2016-05-20 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/1980


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (FLINK-3945) Degree annotation for directed graphs

2016-05-20 Thread Greg Hogan (JIRA)
Greg Hogan created FLINK-3945:
-

 Summary: Degree annotation for directed graphs
 Key: FLINK-3945
 URL: https://issues.apache.org/jira/browse/FLINK-3945
 Project: Flink
  Issue Type: Improvement
  Components: Gelly
Affects Versions: 1.1.0
Reporter: Greg Hogan
Assignee: Greg Hogan
Priority: Minor
 Fix For: 1.1.0


There is a third degree count for vertices in directed graphs which is the 
distinct count of out- and in-neighbors. This also adds edge annotation of the 
vertex degrees for directed graphs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3928) Potential overflow due to 32-bit int arithmetic

2016-05-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15293692#comment-15293692
 ] 

ASF GitHub Bot commented on FLINK-3928:
---

Github user greghogan commented on the pull request:

https://github.com/apache/flink/pull/2006#issuecomment-220658635
  
Merging ...


> Potential overflow due to 32-bit int arithmetic
> ---
>
> Key: FLINK-3928
> URL: https://issues.apache.org/jira/browse/FLINK-3928
> Project: Flink
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Greg Hogan
>Priority: Minor
> Fix For: 1.1.0
>
>
> The following pattern occurs in both LocalClusteringCoefficient.java and 
> TriangleListing.java :
> {code}
> int scale = parameters.getInt("scale", DEFAULT_SCALE);
> ...
> long vertexCount = 1 << scale;
> {code}
> "1 << scale" may overflow with type "int" due to the use of 32-bit arithmetic



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3780) Jaccard Similarity

2016-05-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15293698#comment-15293698
 ] 

ASF GitHub Bot commented on FLINK-3780:
---

Github user greghogan commented on the pull request:

https://github.com/apache/flink/pull/1980#issuecomment-220658718
  
Thanks for the reviews @vasia! Merging ...


> Jaccard Similarity
> --
>
> Key: FLINK-3780
> URL: https://issues.apache.org/jira/browse/FLINK-3780
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Affects Versions: 1.1.0
>Reporter: Greg Hogan
>Assignee: Greg Hogan
> Fix For: 1.1.0
>
>
> Implement a Jaccard Similarity algorithm computing all non-zero similarity 
> scores. This algorithm is similar to {{TriangleListing}} but instead of 
> joining two-paths against an edge list we count two-paths.
> {{flink-gelly-examples}} currently has {{JaccardSimilarityMeasure}} which 
> relies on {{Graph.getTriplets()}} so only computes similarity scores for 
> neighbors but not neighbors-of-neighbors.
> This algorithm is easily modified for other similarity scores such as 
> Adamic-Adar similarity where the sum of endpoint degrees is replaced by the 
> degree of the middle vertex.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3928] [gelly] Potential overflow due to...

2016-05-20 Thread greghogan
Github user greghogan commented on the pull request:

https://github.com/apache/flink/pull/2006#issuecomment-220658635
  
Merging ...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2044) Implementation of Gelly HITS Algorithm

2016-05-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15293669#comment-15293669
 ] 

ASF GitHub Bot commented on FLINK-2044:
---

Github user greghogan commented on the pull request:

https://github.com/apache/flink/pull/1956#issuecomment-220656004
  
Stanford provides several old graph datasets at 
https://snap.stanford.edu/data/index.html which might prove a better standard 
for benchmarking.


> Implementation of Gelly HITS Algorithm
> --
>
> Key: FLINK-2044
> URL: https://issues.apache.org/jira/browse/FLINK-2044
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Reporter: Ahamd Javid
>Assignee: GaoLun
>Priority: Minor
>
> Implementation of Hits Algorithm in Gelly API using Java. the feature branch 
> can be found here: (https://github.com/JavidMayar/flink/commits/HITS)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2044] [gelly] Implementation of Gelly H...

2016-05-20 Thread greghogan
Github user greghogan commented on the pull request:

https://github.com/apache/flink/pull/1956#issuecomment-220656004
  
Stanford provides several old graph datasets at 
https://snap.stanford.edu/data/index.html which might prove a better standard 
for benchmarking.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2044) Implementation of Gelly HITS Algorithm

2016-05-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15293667#comment-15293667
 ] 

ASF GitHub Bot commented on FLINK-2044:
---

Github user greghogan commented on a diff in the pull request:

https://github.com/apache/flink/pull/1956#discussion_r64070513
  
--- Diff: 
flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/HITSAlgorithm.java
 ---
@@ -273,4 +289,23 @@ public void sendMessages(Vertex> vertex) {
return initVertexValue;
}
}
+
+   public static class AuthorityEdgeMapper implements 
MapFunction, String> {
--- End diff --

A boolean is stored as a byte as would a bitmask which allows the edge set 
to be compressed on bidirectional edges. Not sure how common bidirectional 
edges are in real-world datasets.


> Implementation of Gelly HITS Algorithm
> --
>
> Key: FLINK-2044
> URL: https://issues.apache.org/jira/browse/FLINK-2044
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Reporter: Ahamd Javid
>Assignee: GaoLun
>Priority: Minor
>
> Implementation of Hits Algorithm in Gelly API using Java. the feature branch 
> can be found here: (https://github.com/JavidMayar/flink/commits/HITS)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2044] [gelly] Implementation of Gelly H...

2016-05-20 Thread greghogan
Github user greghogan commented on a diff in the pull request:

https://github.com/apache/flink/pull/1956#discussion_r64070513
  
--- Diff: 
flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/HITSAlgorithm.java
 ---
@@ -273,4 +289,23 @@ public void sendMessages(Vertex> vertex) {
return initVertexValue;
}
}
+
+   public static class AuthorityEdgeMapper implements 
MapFunction, String> {
--- End diff --

A boolean is stored as a byte as would a bitmask which allows the edge set 
to be compressed on bidirectional edges. Not sure how common bidirectional 
edges are in real-world datasets.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2144) Incremental count, average, and variance for windows

2016-05-20 Thread Gabor Gevay (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15293629#comment-15293629
 ] 

Gabor Gevay commented on FLINK-2144:


But why couldn't we provide FoldFunctions for these in the API?

> Incremental count, average, and variance for windows
> 
>
> Key: FLINK-2144
> URL: https://issues.apache.org/jira/browse/FLINK-2144
> Project: Flink
>  Issue Type: New Feature
>  Components: Streaming
>Reporter: Gabor Gevay
>Priority: Minor
>  Labels: statistics
>
> By count I mean the number of elements in the window.
> These can be implemented very efficiently building on FLINK-2143:
> Store: O(1)
> Evict: O(1)
> emitWindow: O(1)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2144) Incremental count, average, and variance for windows

2016-05-20 Thread Aljoscha Krettek (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15293609#comment-15293609
 ] 

Aljoscha Krettek commented on FLINK-2144:
-

Right now we don't really have place in the doc where we would put such a 
thing. We could either add such a section or add an example to to the builtin 
Flink examples.

> Incremental count, average, and variance for windows
> 
>
> Key: FLINK-2144
> URL: https://issues.apache.org/jira/browse/FLINK-2144
> Project: Flink
>  Issue Type: New Feature
>  Components: Streaming
>Reporter: Gabor Gevay
>Priority: Minor
>  Labels: statistics
>
> By count I mean the number of elements in the window.
> These can be implemented very efficiently building on FLINK-2143:
> Store: O(1)
> Evict: O(1)
> emitWindow: O(1)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2144) Incremental count, average, and variance for windows

2016-05-20 Thread Trevor Grant (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15293601#comment-15293601
 ] 

Trevor Grant commented on FLINK-2144:
-

Implement as a convenience functions and call out in the docs that custom 
implementations are probably more efficient?

OR

Update docs with a nice robust example showing how to do this. 



> Incremental count, average, and variance for windows
> 
>
> Key: FLINK-2144
> URL: https://issues.apache.org/jira/browse/FLINK-2144
> Project: Flink
>  Issue Type: New Feature
>  Components: Streaming
>Reporter: Gabor Gevay
>Priority: Minor
>  Labels: statistics
>
> By count I mean the number of elements in the window.
> These can be implemented very efficiently building on FLINK-2143:
> Store: O(1)
> Evict: O(1)
> emitWindow: O(1)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-3893) LeaderChangeStateCleanupTest times out

2016-05-20 Thread Maximilian Michels (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels closed FLINK-3893.
-
   Resolution: Fixed
Fix Version/s: 1.1.0

9b8de6a6cb3b1cc9ebeb623371e7fef3a6cb763d

> LeaderChangeStateCleanupTest times out
> --
>
> Key: FLINK-3893
> URL: https://issues.apache.org/jira/browse/FLINK-3893
> Project: Flink
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 1.1.0
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Minor
>  Labels: test-stability
> Fix For: 1.1.0
>
>
> {{cluster.waitForTaskManagersToBeRegistered();}} needs to be replaced by 
> {{cluster.waitForTaskManagersToBeRegistered(timeout);}}
> {noformat}
> testStateCleanupAfterListenerNotification(org.apache.flink.runtime.leaderelection.LeaderChangeStateCleanupTest)
>   Time elapsed: 10.106 sec  <<< ERROR!
> java.util.concurrent.TimeoutException: Futures timed out after [1 
> milliseconds]
>   at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
>   at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:153)
>   at scala.concurrent.Await$$anonfun$ready$1.apply(package.scala:86)
>   at scala.concurrent.Await$$anonfun$ready$1.apply(package.scala:86)
>   at 
> scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
>   at scala.concurrent.Await$.ready(package.scala:86)
>   at 
> org.apache.flink.runtime.minicluster.FlinkMiniCluster.waitForTaskManagersToBeRegistered(FlinkMiniCluster.scala:455)
>   at 
> org.apache.flink.runtime.minicluster.FlinkMiniCluster.waitForTaskManagersToBeRegistered(FlinkMiniCluster.scala:439)
>   at 
> org.apache.flink.runtime.leaderelection.LeaderChangeStateCleanupTest.testStateCleanupAfterListenerNotification(LeaderChangeStateCleanupTest.java:181)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3938) Yarn tests don't run on the current master

2016-05-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15293572#comment-15293572
 ] 

ASF GitHub Bot commented on FLINK-3938:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/2012


> Yarn tests don't run on the current master
> --
>
> Key: FLINK-3938
> URL: https://issues.apache.org/jira/browse/FLINK-3938
> Project: Flink
>  Issue Type: Bug
>  Components: Build System
>Affects Versions: 1.1.0
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Critical
> Fix For: 1.1.0
>
>
> Independently of FLINK-3909, I just discovered that the Yarn tests don't run 
> on the current master (09b428b).
> {noformat}
> [INFO] 
> 
> [INFO] Building flink-yarn-tests 1.1-SNAPSHOT
> [INFO] 
> 
> [INFO] 
> [INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ 
> flink-yarn-tests_2.10 ---
> [INFO] 
> [INFO] --- maven-checkstyle-plugin:2.16:check (validate) @ 
> flink-yarn-tests_2.10 ---
> [INFO] 
> [INFO] --- maven-enforcer-plugin:1.3.1:enforce (enforce-maven) @ 
> flink-yarn-tests_2.10 ---
> [INFO] 
> [INFO] --- build-helper-maven-plugin:1.7:add-source (add-source) @ 
> flink-yarn-tests_2.10 ---
> [INFO] Source directory: 
> /home/travis/build/apache/flink/flink-yarn-tests/src/main/scala added.
> [INFO] 
> [INFO] --- maven-remote-resources-plugin:1.5:process (default) @ 
> flink-yarn-tests_2.10 ---
> [INFO] 
> [INFO] --- maven-resources-plugin:2.6:resources (default-resources) @ 
> flink-yarn-tests_2.10 ---
> [INFO] Using 'UTF-8' encoding to copy filtered resources.
> [INFO] skip non existing resourceDirectory 
> /home/travis/build/apache/flink/flink-yarn-tests/src/main/resources
> [INFO] Copying 3 resources
> [INFO] 
> [INFO] --- scala-maven-plugin:3.1.4:compile (scala-compile-first) @ 
> flink-yarn-tests_2.10 ---
> [INFO] No sources to compile
> [INFO] 
> [INFO] --- maven-compiler-plugin:3.1:compile (default-compile) @ 
> flink-yarn-tests_2.10 ---
> [INFO] No sources to compile
> [INFO] 
> [INFO] --- build-helper-maven-plugin:1.7:add-test-source (add-test-source) @ 
> flink-yarn-tests_2.10 ---
> [INFO] Test Source directory: 
> /home/travis/build/apache/flink/flink-yarn-tests/src/test/scala added.
> [INFO] 
> [INFO] --- maven-resources-plugin:2.6:testResources (default-testResources) @ 
> flink-yarn-tests_2.10 ---
> [INFO] Using 'UTF-8' encoding to copy filtered resources.
> [INFO] Copying 1 resource
> [INFO] Copying 3 resources
> [INFO] 
> [INFO] --- scala-maven-plugin:3.1.4:testCompile (scala-test-compile) @ 
> flink-yarn-tests_2.10 ---
> [INFO] /home/travis/build/apache/flink/flink-yarn-tests/src/test/scala:-1: 
> info: compiling
> [INFO] Compiling 2 source files to 
> /home/travis/build/apache/flink/flink-yarn-tests/target/test-classes at 
> 1463615798796
> [INFO] prepare-compile in 0 s
> [INFO] compile in 9 s
> [INFO] 
> [INFO] --- maven-compiler-plugin:3.1:testCompile (default-testCompile) @ 
> flink-yarn-tests_2.10 ---
> [INFO] Nothing to compile - all classes are up to date
> [INFO] 
> [INFO] --- maven-surefire-plugin:2.18.1:test (default-test) @ 
> flink-yarn-tests_2.10 ---
> [INFO] Surefire report directory: 
> /home/travis/build/apache/flink/flink-yarn-tests/target/surefire-reports
> [WARNING] The system property log4j.configuration is configured twice! The 
> property appears in  and any of , 
>  or user property.
> ---
>  T E S T S
> ---
> Results :
> Tests run: 0, Failures: 0, Errors: 0, Skipped: 0
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3909) Maven Failsafe plugin may report SUCCESS on failed tests

2016-05-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15293576#comment-15293576
 ] 

ASF GitHub Bot commented on FLINK-3909:
---

Github user mxm commented on the pull request:

https://github.com/apache/flink/pull/2003#issuecomment-220642391
  
Merging...we need to fix the RocksDB issue afterwards. In the worst case, 
we'll comment out the relevant test cases for now. The test which seems to 
cause the segfaults is `EventTimeWindowCheckpointingITCase`.


> Maven Failsafe plugin may report SUCCESS on failed tests
> 
>
> Key: FLINK-3909
> URL: https://issues.apache.org/jira/browse/FLINK-3909
> Project: Flink
>  Issue Type: Bug
>  Components: Build System
>Affects Versions: 1.1.0
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
> Fix For: 1.1.0
>
>
> The following build completed successfully on Travis but there are actually 
> test failures: https://travis-ci.org/apache/flink/jobs/129943398#L5402



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3893) LeaderChangeStateCleanupTest times out

2016-05-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15293571#comment-15293571
 ] 

ASF GitHub Bot commented on FLINK-3893:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/2009


> LeaderChangeStateCleanupTest times out
> --
>
> Key: FLINK-3893
> URL: https://issues.apache.org/jira/browse/FLINK-3893
> Project: Flink
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 1.1.0
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Minor
>  Labels: test-stability
> Fix For: 1.1.0
>
>
> {{cluster.waitForTaskManagersToBeRegistered();}} needs to be replaced by 
> {{cluster.waitForTaskManagersToBeRegistered(timeout);}}
> {noformat}
> testStateCleanupAfterListenerNotification(org.apache.flink.runtime.leaderelection.LeaderChangeStateCleanupTest)
>   Time elapsed: 10.106 sec  <<< ERROR!
> java.util.concurrent.TimeoutException: Futures timed out after [1 
> milliseconds]
>   at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
>   at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:153)
>   at scala.concurrent.Await$$anonfun$ready$1.apply(package.scala:86)
>   at scala.concurrent.Await$$anonfun$ready$1.apply(package.scala:86)
>   at 
> scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
>   at scala.concurrent.Await$.ready(package.scala:86)
>   at 
> org.apache.flink.runtime.minicluster.FlinkMiniCluster.waitForTaskManagersToBeRegistered(FlinkMiniCluster.scala:455)
>   at 
> org.apache.flink.runtime.minicluster.FlinkMiniCluster.waitForTaskManagersToBeRegistered(FlinkMiniCluster.scala:439)
>   at 
> org.apache.flink.runtime.leaderelection.LeaderChangeStateCleanupTest.testStateCleanupAfterListenerNotification(LeaderChangeStateCleanupTest.java:181)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3909) Maven Failsafe plugin may report SUCCESS on failed tests

2016-05-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15293573#comment-15293573
 ] 

ASF GitHub Bot commented on FLINK-3909:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/2003


> Maven Failsafe plugin may report SUCCESS on failed tests
> 
>
> Key: FLINK-3909
> URL: https://issues.apache.org/jira/browse/FLINK-3909
> Project: Flink
>  Issue Type: Bug
>  Components: Build System
>Affects Versions: 1.1.0
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
> Fix For: 1.1.0
>
>
> The following build completed successfully on Travis but there are actually 
> test failures: https://travis-ci.org/apache/flink/jobs/129943398#L5402



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3892) ConnectionUtils may die with NullPointerException

2016-05-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15293570#comment-15293570
 ] 

ASF GitHub Bot commented on FLINK-3892:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/2008


> ConnectionUtils may die with NullPointerException
> -
>
> Key: FLINK-3892
> URL: https://issues.apache.org/jira/browse/FLINK-3892
> Project: Flink
>  Issue Type: Bug
>  Components: YARN Client
>Affects Versions: 1.1.0, 1.0.3
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Minor
> Fix For: 1.1.0
>
>
> If an invalid hostname is specificed or the hostname can't be resolved from 
> the current interface, {{ConnectionUtils.findAddressUsingStrategy}} may throw 
> a {{NullPointerException}}. When trying to access the {{InetAddress}} of an 
> {{InetSocketAddress}}, null is returned when the host could not been resolved.
> The solution is to abort the attempt to find the local address if the host 
> couldn't be resolved.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3909] update Maven Failsafe version

2016-05-20 Thread mxm
Github user mxm commented on the pull request:

https://github.com/apache/flink/pull/2003#issuecomment-220642391
  
Merging...we need to fix the RocksDB issue afterwards. In the worst case, 
we'll comment out the relevant test cases for now. The test which seems to 
cause the segfaults is `EventTimeWindowCheckpointingITCase`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Closed] (FLINK-3938) Yarn tests don't run on the current master

2016-05-20 Thread Maximilian Michels (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels closed FLINK-3938.
-
Resolution: Fixed

9a4fdd5fb3a7097035e74b0bf685553bbfdf7f43

> Yarn tests don't run on the current master
> --
>
> Key: FLINK-3938
> URL: https://issues.apache.org/jira/browse/FLINK-3938
> Project: Flink
>  Issue Type: Bug
>  Components: Build System
>Affects Versions: 1.1.0
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Critical
> Fix For: 1.1.0
>
>
> Independently of FLINK-3909, I just discovered that the Yarn tests don't run 
> on the current master (09b428b).
> {noformat}
> [INFO] 
> 
> [INFO] Building flink-yarn-tests 1.1-SNAPSHOT
> [INFO] 
> 
> [INFO] 
> [INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ 
> flink-yarn-tests_2.10 ---
> [INFO] 
> [INFO] --- maven-checkstyle-plugin:2.16:check (validate) @ 
> flink-yarn-tests_2.10 ---
> [INFO] 
> [INFO] --- maven-enforcer-plugin:1.3.1:enforce (enforce-maven) @ 
> flink-yarn-tests_2.10 ---
> [INFO] 
> [INFO] --- build-helper-maven-plugin:1.7:add-source (add-source) @ 
> flink-yarn-tests_2.10 ---
> [INFO] Source directory: 
> /home/travis/build/apache/flink/flink-yarn-tests/src/main/scala added.
> [INFO] 
> [INFO] --- maven-remote-resources-plugin:1.5:process (default) @ 
> flink-yarn-tests_2.10 ---
> [INFO] 
> [INFO] --- maven-resources-plugin:2.6:resources (default-resources) @ 
> flink-yarn-tests_2.10 ---
> [INFO] Using 'UTF-8' encoding to copy filtered resources.
> [INFO] skip non existing resourceDirectory 
> /home/travis/build/apache/flink/flink-yarn-tests/src/main/resources
> [INFO] Copying 3 resources
> [INFO] 
> [INFO] --- scala-maven-plugin:3.1.4:compile (scala-compile-first) @ 
> flink-yarn-tests_2.10 ---
> [INFO] No sources to compile
> [INFO] 
> [INFO] --- maven-compiler-plugin:3.1:compile (default-compile) @ 
> flink-yarn-tests_2.10 ---
> [INFO] No sources to compile
> [INFO] 
> [INFO] --- build-helper-maven-plugin:1.7:add-test-source (add-test-source) @ 
> flink-yarn-tests_2.10 ---
> [INFO] Test Source directory: 
> /home/travis/build/apache/flink/flink-yarn-tests/src/test/scala added.
> [INFO] 
> [INFO] --- maven-resources-plugin:2.6:testResources (default-testResources) @ 
> flink-yarn-tests_2.10 ---
> [INFO] Using 'UTF-8' encoding to copy filtered resources.
> [INFO] Copying 1 resource
> [INFO] Copying 3 resources
> [INFO] 
> [INFO] --- scala-maven-plugin:3.1.4:testCompile (scala-test-compile) @ 
> flink-yarn-tests_2.10 ---
> [INFO] /home/travis/build/apache/flink/flink-yarn-tests/src/test/scala:-1: 
> info: compiling
> [INFO] Compiling 2 source files to 
> /home/travis/build/apache/flink/flink-yarn-tests/target/test-classes at 
> 1463615798796
> [INFO] prepare-compile in 0 s
> [INFO] compile in 9 s
> [INFO] 
> [INFO] --- maven-compiler-plugin:3.1:testCompile (default-testCompile) @ 
> flink-yarn-tests_2.10 ---
> [INFO] Nothing to compile - all classes are up to date
> [INFO] 
> [INFO] --- maven-surefire-plugin:2.18.1:test (default-test) @ 
> flink-yarn-tests_2.10 ---
> [INFO] Surefire report directory: 
> /home/travis/build/apache/flink/flink-yarn-tests/target/surefire-reports
> [WARNING] The system property log4j.configuration is configured twice! The 
> property appears in  and any of , 
>  or user property.
> ---
>  T E S T S
> ---
> Results :
> Tests run: 0, Failures: 0, Errors: 0, Skipped: 0
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-3892) ConnectionUtils may die with NullPointerException

2016-05-20 Thread Maximilian Michels (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels closed FLINK-3892.
-
Resolution: Fixed

5dfb8a013cbf769bee641c04ebc666a8dc2f3e14

> ConnectionUtils may die with NullPointerException
> -
>
> Key: FLINK-3892
> URL: https://issues.apache.org/jira/browse/FLINK-3892
> Project: Flink
>  Issue Type: Bug
>  Components: YARN Client
>Affects Versions: 1.1.0, 1.0.3
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Minor
> Fix For: 1.1.0
>
>
> If an invalid hostname is specificed or the hostname can't be resolved from 
> the current interface, {{ConnectionUtils.findAddressUsingStrategy}} may throw 
> a {{NullPointerException}}. When trying to access the {{InetAddress}} of an 
> {{InetSocketAddress}}, null is returned when the host could not been resolved.
> The solution is to abort the attempt to find the local address if the host 
> couldn't be resolved.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3938] re-enable Yarn tests

2016-05-20 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/2012


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Closed] (FLINK-3909) Maven Failsafe plugin may report SUCCESS on failed tests

2016-05-20 Thread Maximilian Michels (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels closed FLINK-3909.
-
Resolution: Fixed

38698c0b101cbb48f8c10adf4060983ac07e2f4b

> Maven Failsafe plugin may report SUCCESS on failed tests
> 
>
> Key: FLINK-3909
> URL: https://issues.apache.org/jira/browse/FLINK-3909
> Project: Flink
>  Issue Type: Bug
>  Components: Build System
>Affects Versions: 1.1.0
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
> Fix For: 1.1.0
>
>
> The following build completed successfully on Travis but there are actually 
> test failures: https://travis-ci.org/apache/flink/jobs/129943398#L5402



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3893] improve LeaderChangeStateCleanupT...

2016-05-20 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/2009


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-3892] ConnectionUtils may die with Null...

2016-05-20 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/2008


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-3909] update Maven Failsafe version

2016-05-20 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/2003


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2144) Incremental count, average, and variance for windows

2016-05-20 Thread Aljoscha Krettek (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15293562#comment-15293562
 ] 

Aljoscha Krettek commented on FLINK-2144:
-

I think this can easily be done by users in a window with a {{FoldFunction}}. 
Doing it in a generic way as in the aggregation functions on {{WindowedStream}} 
will never be as fast as a custom implementation.

I would vote to close this issue.

> Incremental count, average, and variance for windows
> 
>
> Key: FLINK-2144
> URL: https://issues.apache.org/jira/browse/FLINK-2144
> Project: Flink
>  Issue Type: New Feature
>  Components: Streaming
>Reporter: Gabor Gevay
>Priority: Minor
>  Labels: statistics
>
> By count I mean the number of elements in the window.
> These can be implemented very efficiently building on FLINK-2143:
> Store: O(1)
> Evict: O(1)
> emitWindow: O(1)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2044] [gelly] Implementation of Gelly H...

2016-05-20 Thread gallenvara
Github user gallenvara commented on a diff in the pull request:

https://github.com/apache/flink/pull/1956#discussion_r64056511
  
--- Diff: 
flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/HITSAlgorithm.java
 ---
@@ -273,4 +289,23 @@ public void sendMessages(Vertex> vertex) {
return initVertexValue;
}
}
+
+   public static class AuthorityEdgeMapper implements 
MapFunction, String> {
--- End diff --

I mean that they had a small difference in running time.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2044) Implementation of Gelly HITS Algorithm

2016-05-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15293519#comment-15293519
 ] 

ASF GitHub Bot commented on FLINK-2044:
---

Github user gallenvara commented on the pull request:

https://github.com/apache/flink/pull/1956#issuecomment-220633090
  
@vasia vertex num: 1, edge num: 3; vertex num: 3, edge num: 
10.


> Implementation of Gelly HITS Algorithm
> --
>
> Key: FLINK-2044
> URL: https://issues.apache.org/jira/browse/FLINK-2044
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Reporter: Ahamd Javid
>Assignee: GaoLun
>Priority: Minor
>
> Implementation of Hits Algorithm in Gelly API using Java. the feature branch 
> can be found here: (https://github.com/JavidMayar/flink/commits/HITS)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2044] [gelly] Implementation of Gelly H...

2016-05-20 Thread gallenvara
Github user gallenvara commented on the pull request:

https://github.com/apache/flink/pull/1956#issuecomment-220633090
  
@vasia vertex num: 1, edge num: 3; vertex num: 3, edge num: 
10.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2044) Implementation of Gelly HITS Algorithm

2016-05-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15293511#comment-15293511
 ] 

ASF GitHub Bot commented on FLINK-2044:
---

Github user gallenvara commented on a diff in the pull request:

https://github.com/apache/flink/pull/1956#discussion_r64056511
  
--- Diff: 
flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/HITSAlgorithm.java
 ---
@@ -273,4 +289,23 @@ public void sendMessages(Vertex> vertex) {
return initVertexValue;
}
}
+
+   public static class AuthorityEdgeMapper implements 
MapFunction, String> {
--- End diff --

I mean that they had a small difference in running time.


> Implementation of Gelly HITS Algorithm
> --
>
> Key: FLINK-2044
> URL: https://issues.apache.org/jira/browse/FLINK-2044
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Reporter: Ahamd Javid
>Assignee: GaoLun
>Priority: Minor
>
> Implementation of Hits Algorithm in Gelly API using Java. the feature branch 
> can be found here: (https://github.com/JavidMayar/flink/commits/HITS)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2044) Implementation of Gelly HITS Algorithm

2016-05-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15293501#comment-15293501
 ] 

ASF GitHub Bot commented on FLINK-2044:
---

Github user gallenvara commented on a diff in the pull request:

https://github.com/apache/flink/pull/1956#discussion_r64056167
  
--- Diff: 
flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/HITSAlgorithm.java
 ---
@@ -243,18 +255,22 @@ public void sendMessages(Vertex> vertex) {
iterationValueSum = Math.sqrt(((DoubleValue) 
getPreviousIterationAggregate("updatedValueSum")).getValue());
}
for (Edge edge : getEdges()) {
-   K messageSource = getSuperstepNumber() % 2 == 1 
? edge.getSource() : edge.getTarget();
-   K messageTarget = getSuperstepNumber() % 2 == 1 
? edge.getTarget() : edge.getSource();
-   double messageValue = getSuperstepNumber() % 2 
== 1 ? vertex.getValue().f0.getValue() : vertex.getValue().f1.getValue();
-
-   if (!messageTarget.equals(vertex.getId())) {
-   if (getSuperstepNumber() != 
maxIteration) {
-   sendMessageTo(messageTarget, 
messageValue / iterationValueSum);
-
-   // in order to make every 
vertex updated
-   sendMessageTo(messageSource, 
0.0);
+   if (getSuperstepNumber() != maxIteration) {
+   if (getSuperstepNumber() % 2 == 1) {
+   if 
(edge.getValue().equals("Authority")) {
+   
sendMessageTo(edge.getTarget(), vertex.getValue().f0.getValue() / 
iterationValueSum);
+   }
} else {
-   sendMessageTo(messageSource, 
iterationValueSum);
+   if 
(edge.getValue().equals("Hub")) {
+   
sendMessageTo(edge.getTarget(), vertex.getValue().f1.getValue() / 
iterationValueSum);
+   }
+   }
+   
+   // make all the vertices be updated
+   sendMessageTo(edge.getSource(), 0.0);
--- End diff --

Fixed with rebasing the previous commit.


> Implementation of Gelly HITS Algorithm
> --
>
> Key: FLINK-2044
> URL: https://issues.apache.org/jira/browse/FLINK-2044
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Reporter: Ahamd Javid
>Assignee: GaoLun
>Priority: Minor
>
> Implementation of Hits Algorithm in Gelly API using Java. the feature branch 
> can be found here: (https://github.com/JavidMayar/flink/commits/HITS)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2044] [gelly] Implementation of Gelly H...

2016-05-20 Thread gallenvara
Github user gallenvara commented on a diff in the pull request:

https://github.com/apache/flink/pull/1956#discussion_r64056167
  
--- Diff: 
flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/HITSAlgorithm.java
 ---
@@ -243,18 +255,22 @@ public void sendMessages(Vertex> vertex) {
iterationValueSum = Math.sqrt(((DoubleValue) 
getPreviousIterationAggregate("updatedValueSum")).getValue());
}
for (Edge edge : getEdges()) {
-   K messageSource = getSuperstepNumber() % 2 == 1 
? edge.getSource() : edge.getTarget();
-   K messageTarget = getSuperstepNumber() % 2 == 1 
? edge.getTarget() : edge.getSource();
-   double messageValue = getSuperstepNumber() % 2 
== 1 ? vertex.getValue().f0.getValue() : vertex.getValue().f1.getValue();
-
-   if (!messageTarget.equals(vertex.getId())) {
-   if (getSuperstepNumber() != 
maxIteration) {
-   sendMessageTo(messageTarget, 
messageValue / iterationValueSum);
-
-   // in order to make every 
vertex updated
-   sendMessageTo(messageSource, 
0.0);
+   if (getSuperstepNumber() != maxIteration) {
+   if (getSuperstepNumber() % 2 == 1) {
+   if 
(edge.getValue().equals("Authority")) {
+   
sendMessageTo(edge.getTarget(), vertex.getValue().f0.getValue() / 
iterationValueSum);
+   }
} else {
-   sendMessageTo(messageSource, 
iterationValueSum);
+   if 
(edge.getValue().equals("Hub")) {
+   
sendMessageTo(edge.getTarget(), vertex.getValue().f1.getValue() / 
iterationValueSum);
+   }
+   }
+   
+   // make all the vertices be updated
+   sendMessageTo(edge.getSource(), 0.0);
--- End diff --

Fixed with rebasing the previous commit.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2044) Implementation of Gelly HITS Algorithm

2016-05-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15293499#comment-15293499
 ] 

ASF GitHub Bot commented on FLINK-2044:
---

Github user gallenvara commented on a diff in the pull request:

https://github.com/apache/flink/pull/1956#discussion_r64056076
  
--- Diff: 
flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/HITSAlgorithm.java
 ---
@@ -273,4 +289,23 @@ public void sendMessages(Vertex> vertex) {
return initVertexValue;
}
}
+
+   public static class AuthorityEdgeMapper implements 
MapFunction, String> {
--- End diff --

Before committing the newest codes, i have test the edge with `String` 
label and `boolean` label. And i found that `boolean` label is not fast than 
the `String` label. So finally i used the "hub" & "authority", also it's very 
intuitionistic.


> Implementation of Gelly HITS Algorithm
> --
>
> Key: FLINK-2044
> URL: https://issues.apache.org/jira/browse/FLINK-2044
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Reporter: Ahamd Javid
>Assignee: GaoLun
>Priority: Minor
>
> Implementation of Hits Algorithm in Gelly API using Java. the feature branch 
> can be found here: (https://github.com/JavidMayar/flink/commits/HITS)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2044) Implementation of Gelly HITS Algorithm

2016-05-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15293488#comment-15293488
 ] 

ASF GitHub Bot commented on FLINK-2044:
---

Github user gallenvara commented on a diff in the pull request:

https://github.com/apache/flink/pull/1956#discussion_r64055156
  
--- Diff: 
flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/HITSAlgorithm.java
 ---
@@ -243,18 +255,22 @@ public void sendMessages(Vertex> vertex) {
iterationValueSum = Math.sqrt(((DoubleValue) 
getPreviousIterationAggregate("updatedValueSum")).getValue());
}
for (Edge edge : getEdges()) {
-   K messageSource = getSuperstepNumber() % 2 == 1 
? edge.getSource() : edge.getTarget();
-   K messageTarget = getSuperstepNumber() % 2 == 1 
? edge.getTarget() : edge.getSource();
-   double messageValue = getSuperstepNumber() % 2 
== 1 ? vertex.getValue().f0.getValue() : vertex.getValue().f1.getValue();
-
-   if (!messageTarget.equals(vertex.getId())) {
-   if (getSuperstepNumber() != 
maxIteration) {
-   sendMessageTo(messageTarget, 
messageValue / iterationValueSum);
-
-   // in order to make every 
vertex updated
-   sendMessageTo(messageSource, 
0.0);
+   if (getSuperstepNumber() != maxIteration) {
+   if (getSuperstepNumber() % 2 == 1) {
+   if 
(edge.getValue().equals("Authority")) {
+   
sendMessageTo(edge.getTarget(), vertex.getValue().f0.getValue() / 
iterationValueSum);
+   }
} else {
-   sendMessageTo(messageSource, 
iterationValueSum);
+   if 
(edge.getValue().equals("Hub")) {
+   
sendMessageTo(edge.getTarget(), vertex.getValue().f1.getValue() / 
iterationValueSum);
+   }
+   }
+   
+   // make all the vertices be updated
+   sendMessageTo(edge.getSource(), 0.0);
--- End diff --

Sorry, this line should be moved. :)
In the previous commit without edge label, i use the same edge for hub and 
authority updating and this line would keep every vertex updating. But in the 
newest implementation with edge label, we can drop this because the added edges 
can do the same work.


> Implementation of Gelly HITS Algorithm
> --
>
> Key: FLINK-2044
> URL: https://issues.apache.org/jira/browse/FLINK-2044
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Reporter: Ahamd Javid
>Assignee: GaoLun
>Priority: Minor
>
> Implementation of Hits Algorithm in Gelly API using Java. the feature branch 
> can be found here: (https://github.com/JavidMayar/flink/commits/HITS)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2044] [gelly] Implementation of Gelly H...

2016-05-20 Thread gallenvara
Github user gallenvara commented on a diff in the pull request:

https://github.com/apache/flink/pull/1956#discussion_r64055156
  
--- Diff: 
flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/HITSAlgorithm.java
 ---
@@ -243,18 +255,22 @@ public void sendMessages(Vertex> vertex) {
iterationValueSum = Math.sqrt(((DoubleValue) 
getPreviousIterationAggregate("updatedValueSum")).getValue());
}
for (Edge edge : getEdges()) {
-   K messageSource = getSuperstepNumber() % 2 == 1 
? edge.getSource() : edge.getTarget();
-   K messageTarget = getSuperstepNumber() % 2 == 1 
? edge.getTarget() : edge.getSource();
-   double messageValue = getSuperstepNumber() % 2 
== 1 ? vertex.getValue().f0.getValue() : vertex.getValue().f1.getValue();
-
-   if (!messageTarget.equals(vertex.getId())) {
-   if (getSuperstepNumber() != 
maxIteration) {
-   sendMessageTo(messageTarget, 
messageValue / iterationValueSum);
-
-   // in order to make every 
vertex updated
-   sendMessageTo(messageSource, 
0.0);
+   if (getSuperstepNumber() != maxIteration) {
+   if (getSuperstepNumber() % 2 == 1) {
+   if 
(edge.getValue().equals("Authority")) {
+   
sendMessageTo(edge.getTarget(), vertex.getValue().f0.getValue() / 
iterationValueSum);
+   }
} else {
-   sendMessageTo(messageSource, 
iterationValueSum);
+   if 
(edge.getValue().equals("Hub")) {
+   
sendMessageTo(edge.getTarget(), vertex.getValue().f1.getValue() / 
iterationValueSum);
+   }
+   }
+   
+   // make all the vertices be updated
+   sendMessageTo(edge.getSource(), 0.0);
--- End diff --

Sorry, this line should be moved. :)
In the previous commit without edge label, i use the same edge for hub and 
authority updating and this line would keep every vertex updating. But in the 
newest implementation with edge label, we can drop this because the added edges 
can do the same work.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2044) Implementation of Gelly HITS Algorithm

2016-05-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15293473#comment-15293473
 ] 

ASF GitHub Bot commented on FLINK-2044:
---

Github user vasia commented on the pull request:

https://github.com/apache/flink/pull/1956#issuecomment-220628229
  
Thank you for the update @gallenvara and for your patience with our 
continuous comments :)
How many edges did the graphs in your experiment have?


> Implementation of Gelly HITS Algorithm
> --
>
> Key: FLINK-2044
> URL: https://issues.apache.org/jira/browse/FLINK-2044
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Reporter: Ahamd Javid
>Assignee: GaoLun
>Priority: Minor
>
> Implementation of Hits Algorithm in Gelly API using Java. the feature branch 
> can be found here: (https://github.com/JavidMayar/flink/commits/HITS)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2044] [gelly] Implementation of Gelly H...

2016-05-20 Thread vasia
Github user vasia commented on the pull request:

https://github.com/apache/flink/pull/1956#issuecomment-220628229
  
Thank you for the update @gallenvara and for your patience with our 
continuous comments :)
How many edges did the graphs in your experiment have?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2044) Implementation of Gelly HITS Algorithm

2016-05-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15293470#comment-15293470
 ] 

ASF GitHub Bot commented on FLINK-2044:
---

Github user vasia commented on a diff in the pull request:

https://github.com/apache/flink/pull/1956#discussion_r64054001
  
--- Diff: 
flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/HITSAlgorithm.java
 ---
@@ -273,4 +289,23 @@ public void sendMessages(Vertex> vertex) {
return initVertexValue;
}
}
+
+   public static class AuthorityEdgeMapper implements 
MapFunction, String> {
--- End diff --

Ah when I proposed to label the edges as "authority" or "hub" I didn't 
really mean to add a `String` label :)
We can do this with a boolean. @greghogan what do you think?


> Implementation of Gelly HITS Algorithm
> --
>
> Key: FLINK-2044
> URL: https://issues.apache.org/jira/browse/FLINK-2044
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Reporter: Ahamd Javid
>Assignee: GaoLun
>Priority: Minor
>
> Implementation of Hits Algorithm in Gelly API using Java. the feature branch 
> can be found here: (https://github.com/JavidMayar/flink/commits/HITS)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2044] [gelly] Implementation of Gelly H...

2016-05-20 Thread vasia
Github user vasia commented on a diff in the pull request:

https://github.com/apache/flink/pull/1956#discussion_r64054001
  
--- Diff: 
flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/HITSAlgorithm.java
 ---
@@ -273,4 +289,23 @@ public void sendMessages(Vertex> vertex) {
return initVertexValue;
}
}
+
+   public static class AuthorityEdgeMapper implements 
MapFunction, String> {
--- End diff --

Ah when I proposed to label the edges as "authority" or "hub" I didn't 
really mean to add a `String` label :)
We can do this with a boolean. @greghogan what do you think?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2044) Implementation of Gelly HITS Algorithm

2016-05-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15293452#comment-15293452
 ] 

ASF GitHub Bot commented on FLINK-2044:
---

Github user vasia commented on a diff in the pull request:

https://github.com/apache/flink/pull/1956#discussion_r64051693
  
--- Diff: 
flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/HITSAlgorithm.java
 ---
@@ -243,18 +255,22 @@ public void sendMessages(Vertex> vertex) {
iterationValueSum = Math.sqrt(((DoubleValue) 
getPreviousIterationAggregate("updatedValueSum")).getValue());
}
for (Edge edge : getEdges()) {
-   K messageSource = getSuperstepNumber() % 2 == 1 
? edge.getSource() : edge.getTarget();
-   K messageTarget = getSuperstepNumber() % 2 == 1 
? edge.getTarget() : edge.getSource();
-   double messageValue = getSuperstepNumber() % 2 
== 1 ? vertex.getValue().f0.getValue() : vertex.getValue().f1.getValue();
-
-   if (!messageTarget.equals(vertex.getId())) {
-   if (getSuperstepNumber() != 
maxIteration) {
-   sendMessageTo(messageTarget, 
messageValue / iterationValueSum);
-
-   // in order to make every 
vertex updated
-   sendMessageTo(messageSource, 
0.0);
+   if (getSuperstepNumber() != maxIteration) {
+   if (getSuperstepNumber() % 2 == 1) {
+   if 
(edge.getValue().equals("Authority")) {
+   
sendMessageTo(edge.getTarget(), vertex.getValue().f0.getValue() / 
iterationValueSum);
+   }
} else {
-   sendMessageTo(messageSource, 
iterationValueSum);
+   if 
(edge.getValue().equals("Hub")) {
+   
sendMessageTo(edge.getTarget(), vertex.getValue().f1.getValue() / 
iterationValueSum);
+   }
+   }
+   
+   // make all the vertices be updated
+   sendMessageTo(edge.getSource(), 0.0);
--- End diff --

Why do we need to send this message?


> Implementation of Gelly HITS Algorithm
> --
>
> Key: FLINK-2044
> URL: https://issues.apache.org/jira/browse/FLINK-2044
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Reporter: Ahamd Javid
>Assignee: GaoLun
>Priority: Minor
>
> Implementation of Hits Algorithm in Gelly API using Java. the feature branch 
> can be found here: (https://github.com/JavidMayar/flink/commits/HITS)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2044] [gelly] Implementation of Gelly H...

2016-05-20 Thread vasia
Github user vasia commented on a diff in the pull request:

https://github.com/apache/flink/pull/1956#discussion_r64051693
  
--- Diff: 
flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/HITSAlgorithm.java
 ---
@@ -243,18 +255,22 @@ public void sendMessages(Vertex> vertex) {
iterationValueSum = Math.sqrt(((DoubleValue) 
getPreviousIterationAggregate("updatedValueSum")).getValue());
}
for (Edge edge : getEdges()) {
-   K messageSource = getSuperstepNumber() % 2 == 1 
? edge.getSource() : edge.getTarget();
-   K messageTarget = getSuperstepNumber() % 2 == 1 
? edge.getTarget() : edge.getSource();
-   double messageValue = getSuperstepNumber() % 2 
== 1 ? vertex.getValue().f0.getValue() : vertex.getValue().f1.getValue();
-
-   if (!messageTarget.equals(vertex.getId())) {
-   if (getSuperstepNumber() != 
maxIteration) {
-   sendMessageTo(messageTarget, 
messageValue / iterationValueSum);
-
-   // in order to make every 
vertex updated
-   sendMessageTo(messageSource, 
0.0);
+   if (getSuperstepNumber() != maxIteration) {
+   if (getSuperstepNumber() % 2 == 1) {
+   if 
(edge.getValue().equals("Authority")) {
+   
sendMessageTo(edge.getTarget(), vertex.getValue().f0.getValue() / 
iterationValueSum);
+   }
} else {
-   sendMessageTo(messageSource, 
iterationValueSum);
+   if 
(edge.getValue().equals("Hub")) {
+   
sendMessageTo(edge.getTarget(), vertex.getValue().f1.getValue() / 
iterationValueSum);
+   }
+   }
+   
+   // make all the vertices be updated
+   sendMessageTo(edge.getSource(), 0.0);
--- End diff --

Why do we need to send this message?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3780) Jaccard Similarity

2016-05-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15293439#comment-15293439
 ] 

ASF GitHub Bot commented on FLINK-3780:
---

Github user vasia commented on the pull request:

https://github.com/apache/flink/pull/1980#issuecomment-220620841
  
Thanks for the update @greghogan. I left a few comments for improvements of 
the docs. After these are addressed it should be good to merge.


> Jaccard Similarity
> --
>
> Key: FLINK-3780
> URL: https://issues.apache.org/jira/browse/FLINK-3780
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Affects Versions: 1.1.0
>Reporter: Greg Hogan
>Assignee: Greg Hogan
> Fix For: 1.1.0
>
>
> Implement a Jaccard Similarity algorithm computing all non-zero similarity 
> scores. This algorithm is similar to {{TriangleListing}} but instead of 
> joining two-paths against an edge list we count two-paths.
> {{flink-gelly-examples}} currently has {{JaccardSimilarityMeasure}} which 
> relies on {{Graph.getTriplets()}} so only computes similarity scores for 
> neighbors but not neighbors-of-neighbors.
> This algorithm is easily modified for other similarity scores such as 
> Adamic-Adar similarity where the sum of endpoint degrees is replaced by the 
> degree of the middle vertex.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3780] [gelly] Jaccard Similarity

2016-05-20 Thread vasia
Github user vasia commented on a diff in the pull request:

https://github.com/apache/flink/pull/1980#discussion_r64049343
  
--- Diff: docs/apis/batch/libs/gelly.md ---
@@ -2055,22 +2055,22 @@ vertex and edge in the output graph stores the 
common group value and the number
 ### Jaccard Index
 
  Overview
-The Jaccard Index measures the similarity between vertex neighborhoods. 
Scores range from 0.0 (no common neighbors) to
-1.0 (all neighbors are common).
+The Jaccard Index measures the similarity between vertex neighborhoods and 
is computed as the number of shared numbers
+divided by the number of distinct neighbors. Scores range from 0.0 (no 
shared neighbors) to 1.0 (all neighbors are
+shared).
 
  Details
-Counting common neighbors for pairs of vertices is equivalent to counting 
the two-paths consisting of two edges
-connecting the two vertices to the common neighbor. The number of distinct 
neighbors for pairs of vertices is computed
-by storing the sum of degrees of the vertex pair and subtracting the count 
of common neighbors, which are double-counted
-in the sum of degrees.
+Counting shared neighbors for pairs of vertices is equivalent to counting 
connecting paths of length two. The number of
+distinct neighbors is computed by storing the sum of degrees of the vertex 
pair and subtracting the count of shared
+neighbors, which are double-counted in the sum of degrees.
 
-The algorithm first annotates each edge with the endpoint degree. Grouping 
on the midpoint vertex, each pair of
-neighbors is emitted with the endpoint degree sum. Grouping on two-paths, 
the common neighbors are counted.
+The algorithm first annotates each edge with the target vertex's degree. 
Grouping on the source vertex, each pair of
+neighbors is emitted with the degree sum. Grouping on two-paths, the 
shared neighbors are counted.
 
  Usage
 The algorithm takes a simple, undirected graph as input and outputs a 
`DataSet` of tuples containing two vertex IDs,
-the number of common neighbors, and the number of distinct neighbors. The 
graph ID type must be `Comparable` and
-`Copyable`.
+the number of shared neighbors, and the number of distinct neighbors. The 
result class provides a method to compute the
+Jaccard Index score. The graph ID type must be `Comparable` and `Copyable`.
--- End diff --

Here we should also document what is the output of the algorithm, i.e. the 
`Result` type and how to get the jaccard similarity out of it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-3780] [gelly] Jaccard Similarity

2016-05-20 Thread vasia
Github user vasia commented on the pull request:

https://github.com/apache/flink/pull/1980#issuecomment-220620841
  
Thanks for the update @greghogan. I left a few comments for improvements of 
the docs. After these are addressed it should be good to merge.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3780) Jaccard Similarity

2016-05-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15293438#comment-15293438
 ] 

ASF GitHub Bot commented on FLINK-3780:
---

Github user vasia commented on a diff in the pull request:

https://github.com/apache/flink/pull/1980#discussion_r64049343
  
--- Diff: docs/apis/batch/libs/gelly.md ---
@@ -2055,22 +2055,22 @@ vertex and edge in the output graph stores the 
common group value and the number
 ### Jaccard Index
 
  Overview
-The Jaccard Index measures the similarity between vertex neighborhoods. 
Scores range from 0.0 (no common neighbors) to
-1.0 (all neighbors are common).
+The Jaccard Index measures the similarity between vertex neighborhoods and 
is computed as the number of shared numbers
+divided by the number of distinct neighbors. Scores range from 0.0 (no 
shared neighbors) to 1.0 (all neighbors are
+shared).
 
  Details
-Counting common neighbors for pairs of vertices is equivalent to counting 
the two-paths consisting of two edges
-connecting the two vertices to the common neighbor. The number of distinct 
neighbors for pairs of vertices is computed
-by storing the sum of degrees of the vertex pair and subtracting the count 
of common neighbors, which are double-counted
-in the sum of degrees.
+Counting shared neighbors for pairs of vertices is equivalent to counting 
connecting paths of length two. The number of
+distinct neighbors is computed by storing the sum of degrees of the vertex 
pair and subtracting the count of shared
+neighbors, which are double-counted in the sum of degrees.
 
-The algorithm first annotates each edge with the endpoint degree. Grouping 
on the midpoint vertex, each pair of
-neighbors is emitted with the endpoint degree sum. Grouping on two-paths, 
the common neighbors are counted.
+The algorithm first annotates each edge with the target vertex's degree. 
Grouping on the source vertex, each pair of
+neighbors is emitted with the degree sum. Grouping on two-paths, the 
shared neighbors are counted.
 
  Usage
 The algorithm takes a simple, undirected graph as input and outputs a 
`DataSet` of tuples containing two vertex IDs,
-the number of common neighbors, and the number of distinct neighbors. The 
graph ID type must be `Comparable` and
-`Copyable`.
+the number of shared neighbors, and the number of distinct neighbors. The 
result class provides a method to compute the
+Jaccard Index score. The graph ID type must be `Comparable` and `Copyable`.
--- End diff --

Here we should also document what is the output of the algorithm, i.e. the 
`Result` type and how to get the jaccard similarity out of it.


> Jaccard Similarity
> --
>
> Key: FLINK-3780
> URL: https://issues.apache.org/jira/browse/FLINK-3780
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Affects Versions: 1.1.0
>Reporter: Greg Hogan
>Assignee: Greg Hogan
> Fix For: 1.1.0
>
>
> Implement a Jaccard Similarity algorithm computing all non-zero similarity 
> scores. This algorithm is similar to {{TriangleListing}} but instead of 
> joining two-paths against an edge list we count two-paths.
> {{flink-gelly-examples}} currently has {{JaccardSimilarityMeasure}} which 
> relies on {{Graph.getTriplets()}} so only computes similarity scores for 
> neighbors but not neighbors-of-neighbors.
> This algorithm is easily modified for other similarity scores such as 
> Adamic-Adar similarity where the sum of endpoint degrees is replaced by the 
> degree of the middle vertex.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3780) Jaccard Similarity

2016-05-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15293432#comment-15293432
 ] 

ASF GitHub Bot commented on FLINK-3780:
---

Github user vasia commented on a diff in the pull request:

https://github.com/apache/flink/pull/1980#discussion_r64048955
  
--- Diff: 
flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/similarity/JaccardIndex.java
 ---
@@ -43,11 +43,13 @@
 import java.util.List;
 
 /**
- * The Jaccard Index measures the similarity between vertex neighborhoods.
- * Scores range from 0.0 (no common neighbors) to 1.0 (all neighbors are 
common).
+ * The Jaccard Index measures the similarity between vertex neighborhoods 
and
+ * is computed as the number of shared numbers divided by the number of
--- End diff --

numbers -> neighbors


> Jaccard Similarity
> --
>
> Key: FLINK-3780
> URL: https://issues.apache.org/jira/browse/FLINK-3780
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Affects Versions: 1.1.0
>Reporter: Greg Hogan
>Assignee: Greg Hogan
> Fix For: 1.1.0
>
>
> Implement a Jaccard Similarity algorithm computing all non-zero similarity 
> scores. This algorithm is similar to {{TriangleListing}} but instead of 
> joining two-paths against an edge list we count two-paths.
> {{flink-gelly-examples}} currently has {{JaccardSimilarityMeasure}} which 
> relies on {{Graph.getTriplets()}} so only computes similarity scores for 
> neighbors but not neighbors-of-neighbors.
> This algorithm is easily modified for other similarity scores such as 
> Adamic-Adar similarity where the sum of endpoint degrees is replaced by the 
> degree of the middle vertex.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3780] [gelly] Jaccard Similarity

2016-05-20 Thread vasia
Github user vasia commented on a diff in the pull request:

https://github.com/apache/flink/pull/1980#discussion_r64048955
  
--- Diff: 
flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/similarity/JaccardIndex.java
 ---
@@ -43,11 +43,13 @@
 import java.util.List;
 
 /**
- * The Jaccard Index measures the similarity between vertex neighborhoods.
- * Scores range from 0.0 (no common neighbors) to 1.0 (all neighbors are 
common).
+ * The Jaccard Index measures the similarity between vertex neighborhoods 
and
+ * is computed as the number of shared numbers divided by the number of
--- End diff --

numbers -> neighbors


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3780) Jaccard Similarity

2016-05-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15293431#comment-15293431
 ] 

ASF GitHub Bot commented on FLINK-3780:
---

Github user vasia commented on a diff in the pull request:

https://github.com/apache/flink/pull/1980#discussion_r64048499
  
--- Diff: 
flink-libraries/flink-gelly-examples/src/main/java/org/apache/flink/graph/examples/JaccardIndex.java
 ---
@@ -54,73 +57,112 @@
 
public static final boolean DEFAULT_CLIP_AND_FLIP = true;
 
+   private static void printUsage() {
+   System.out.println(WordUtils.wrap("The Jaccard Index measures 
the similarity between vertex" +
+   " neighborhoods and is computed as the number of shared 
numbers divided by the number of" +
--- End diff --

numbers -> neighbors


> Jaccard Similarity
> --
>
> Key: FLINK-3780
> URL: https://issues.apache.org/jira/browse/FLINK-3780
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Affects Versions: 1.1.0
>Reporter: Greg Hogan
>Assignee: Greg Hogan
> Fix For: 1.1.0
>
>
> Implement a Jaccard Similarity algorithm computing all non-zero similarity 
> scores. This algorithm is similar to {{TriangleListing}} but instead of 
> joining two-paths against an edge list we count two-paths.
> {{flink-gelly-examples}} currently has {{JaccardSimilarityMeasure}} which 
> relies on {{Graph.getTriplets()}} so only computes similarity scores for 
> neighbors but not neighbors-of-neighbors.
> This algorithm is easily modified for other similarity scores such as 
> Adamic-Adar similarity where the sum of endpoint degrees is replaced by the 
> degree of the middle vertex.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3780] [gelly] Jaccard Similarity

2016-05-20 Thread vasia
Github user vasia commented on a diff in the pull request:

https://github.com/apache/flink/pull/1980#discussion_r64048499
  
--- Diff: 
flink-libraries/flink-gelly-examples/src/main/java/org/apache/flink/graph/examples/JaccardIndex.java
 ---
@@ -54,73 +57,112 @@
 
public static final boolean DEFAULT_CLIP_AND_FLIP = true;
 
+   private static void printUsage() {
+   System.out.println(WordUtils.wrap("The Jaccard Index measures 
the similarity between vertex" +
+   " neighborhoods and is computed as the number of shared 
numbers divided by the number of" +
--- End diff --

numbers -> neighbors


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3780) Jaccard Similarity

2016-05-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15293430#comment-15293430
 ] 

ASF GitHub Bot commented on FLINK-3780:
---

Github user vasia commented on a diff in the pull request:

https://github.com/apache/flink/pull/1980#discussion_r64048412
  
--- Diff: 
flink-libraries/flink-gelly-examples/src/main/java/org/apache/flink/graph/examples/JaccardIndex.java
 ---
@@ -40,9 +43,9 @@
 /**
  * Driver for the library implementation of Jaccard Index.
  *
- * This example generates an undirected RMat graph with the given scale and
- * edge factor then calculates all non-zero Jaccard Index similarity scores
- * between vertices.
+ * This example reads a simple, undirected graph from a CSV file or 
generates
--- End diff --

remove one "generates"


> Jaccard Similarity
> --
>
> Key: FLINK-3780
> URL: https://issues.apache.org/jira/browse/FLINK-3780
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Affects Versions: 1.1.0
>Reporter: Greg Hogan
>Assignee: Greg Hogan
> Fix For: 1.1.0
>
>
> Implement a Jaccard Similarity algorithm computing all non-zero similarity 
> scores. This algorithm is similar to {{TriangleListing}} but instead of 
> joining two-paths against an edge list we count two-paths.
> {{flink-gelly-examples}} currently has {{JaccardSimilarityMeasure}} which 
> relies on {{Graph.getTriplets()}} so only computes similarity scores for 
> neighbors but not neighbors-of-neighbors.
> This algorithm is easily modified for other similarity scores such as 
> Adamic-Adar similarity where the sum of endpoint degrees is replaced by the 
> degree of the middle vertex.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3780) Jaccard Similarity

2016-05-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15293429#comment-15293429
 ] 

ASF GitHub Bot commented on FLINK-3780:
---

Github user vasia commented on a diff in the pull request:

https://github.com/apache/flink/pull/1980#discussion_r64048334
  
--- Diff: docs/apis/batch/libs/gelly.md ---
@@ -2250,14 +2250,33 @@ graph.run(new TranslateVertexValues(new 
LongValueAddOffset(vertexCount)));
 
 
 
-  translate.TranslateEdgeValues
+  asm.translate.TranslateEdgeValues
   
 Translate edge values using the given 
TranslateFunction.
 {% highlight java %}
 graph.run(new TranslateEdgeValues(new Nullify()));
 {% endhighlight %}
   
 
+
+
+  library.similarity.JaccardIndex
+  
+Measures the similarity between vertex neighborhoods. The 
Jaccard Index score  is computed as the number of shared numbers divided by the 
number of distinct neighbors. Scores range from 0.0 (no shared neighbors) to 
1.0 (all neighbors are shared).
--- End diff --

Why did you add this here and not in the "Usage" section of the library 
method?
I find it a bit confusing... You describe graph algorithms as building 
blocks for other algorithms. Does Jaccard index fall in this category?


> Jaccard Similarity
> --
>
> Key: FLINK-3780
> URL: https://issues.apache.org/jira/browse/FLINK-3780
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Affects Versions: 1.1.0
>Reporter: Greg Hogan
>Assignee: Greg Hogan
> Fix For: 1.1.0
>
>
> Implement a Jaccard Similarity algorithm computing all non-zero similarity 
> scores. This algorithm is similar to {{TriangleListing}} but instead of 
> joining two-paths against an edge list we count two-paths.
> {{flink-gelly-examples}} currently has {{JaccardSimilarityMeasure}} which 
> relies on {{Graph.getTriplets()}} so only computes similarity scores for 
> neighbors but not neighbors-of-neighbors.
> This algorithm is easily modified for other similarity scores such as 
> Adamic-Adar similarity where the sum of endpoint degrees is replaced by the 
> degree of the middle vertex.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3780] [gelly] Jaccard Similarity

2016-05-20 Thread vasia
Github user vasia commented on a diff in the pull request:

https://github.com/apache/flink/pull/1980#discussion_r64048412
  
--- Diff: 
flink-libraries/flink-gelly-examples/src/main/java/org/apache/flink/graph/examples/JaccardIndex.java
 ---
@@ -40,9 +43,9 @@
 /**
  * Driver for the library implementation of Jaccard Index.
  *
- * This example generates an undirected RMat graph with the given scale and
- * edge factor then calculates all non-zero Jaccard Index similarity scores
- * between vertices.
+ * This example reads a simple, undirected graph from a CSV file or 
generates
--- End diff --

remove one "generates"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-3780] [gelly] Jaccard Similarity

2016-05-20 Thread vasia
Github user vasia commented on a diff in the pull request:

https://github.com/apache/flink/pull/1980#discussion_r64048334
  
--- Diff: docs/apis/batch/libs/gelly.md ---
@@ -2250,14 +2250,33 @@ graph.run(new TranslateVertexValues(new 
LongValueAddOffset(vertexCount)));
 
 
 
-  translate.TranslateEdgeValues
+  asm.translate.TranslateEdgeValues
   
 Translate edge values using the given 
TranslateFunction.
 {% highlight java %}
 graph.run(new TranslateEdgeValues(new Nullify()));
 {% endhighlight %}
   
 
+
+
+  library.similarity.JaccardIndex
+  
+Measures the similarity between vertex neighborhoods. The 
Jaccard Index score  is computed as the number of shared numbers divided by the 
number of distinct neighbors. Scores range from 0.0 (no shared neighbors) to 
1.0 (all neighbors are shared).
--- End diff --

Why did you add this here and not in the "Usage" section of the library 
method?
I find it a bit confusing... You describe graph algorithms as building 
blocks for other algorithms. Does Jaccard index fall in this category?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3780) Jaccard Similarity

2016-05-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15293424#comment-15293424
 ] 

ASF GitHub Bot commented on FLINK-3780:
---

Github user vasia commented on a diff in the pull request:

https://github.com/apache/flink/pull/1980#discussion_r64047578
  
--- Diff: docs/apis/batch/libs/gelly.md ---
@@ -2055,22 +2055,22 @@ vertex and edge in the output graph stores the 
common group value and the number
 ### Jaccard Index
 
  Overview
-The Jaccard Index measures the similarity between vertex neighborhoods. 
Scores range from 0.0 (no common neighbors) to
-1.0 (all neighbors are common).
+The Jaccard Index measures the similarity between vertex neighborhoods and 
is computed as the number of shared numbers
--- End diff --

shared numbers -> shared *neighbors?


> Jaccard Similarity
> --
>
> Key: FLINK-3780
> URL: https://issues.apache.org/jira/browse/FLINK-3780
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Affects Versions: 1.1.0
>Reporter: Greg Hogan
>Assignee: Greg Hogan
> Fix For: 1.1.0
>
>
> Implement a Jaccard Similarity algorithm computing all non-zero similarity 
> scores. This algorithm is similar to {{TriangleListing}} but instead of 
> joining two-paths against an edge list we count two-paths.
> {{flink-gelly-examples}} currently has {{JaccardSimilarityMeasure}} which 
> relies on {{Graph.getTriplets()}} so only computes similarity scores for 
> neighbors but not neighbors-of-neighbors.
> This algorithm is easily modified for other similarity scores such as 
> Adamic-Adar similarity where the sum of endpoint degrees is replaced by the 
> degree of the middle vertex.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3780] [gelly] Jaccard Similarity

2016-05-20 Thread greghogan
Github user greghogan commented on the pull request:

https://github.com/apache/flink/pull/1980#issuecomment-220600881
  
@vasia all updates should be in place.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3780) Jaccard Similarity

2016-05-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15293324#comment-15293324
 ] 

ASF GitHub Bot commented on FLINK-3780:
---

Github user greghogan commented on a diff in the pull request:

https://github.com/apache/flink/pull/1980#discussion_r64037567
  
--- Diff: 
flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/asm/degree/annotate/TranslateEdgeDegreeToIntValue.java
 ---
@@ -0,0 +1,51 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.graph.asm.degree.annotate;
+
+import org.apache.flink.api.java.tuple.Tuple2;
+import org.apache.flink.graph.asm.translate.TranslateFunction;
+import org.apache.flink.types.IntValue;
+import org.apache.flink.types.LongValue;
+
+/**
+ * Translate the edge degree returned by the degree annotation functions 
from
+ * {@link LongValue} to {@link IntValue}.
+ *
+ * @param  edge value type
+ */
+public class TranslateEdgeDegreeToIntValue
--- End diff --

I thought on this for a while, and decided that the best option is to 
subsume the translation in the next operation, and discovered that I had 
already made that change. So the class just needs to be removed.


> Jaccard Similarity
> --
>
> Key: FLINK-3780
> URL: https://issues.apache.org/jira/browse/FLINK-3780
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Affects Versions: 1.1.0
>Reporter: Greg Hogan
>Assignee: Greg Hogan
> Fix For: 1.1.0
>
>
> Implement a Jaccard Similarity algorithm computing all non-zero similarity 
> scores. This algorithm is similar to {{TriangleListing}} but instead of 
> joining two-paths against an edge list we count two-paths.
> {{flink-gelly-examples}} currently has {{JaccardSimilarityMeasure}} which 
> relies on {{Graph.getTriplets()}} so only computes similarity scores for 
> neighbors but not neighbors-of-neighbors.
> This algorithm is easily modified for other similarity scores such as 
> Adamic-Adar similarity where the sum of endpoint degrees is replaced by the 
> degree of the middle vertex.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3780] [gelly] Jaccard Similarity

2016-05-20 Thread greghogan
Github user greghogan commented on a diff in the pull request:

https://github.com/apache/flink/pull/1980#discussion_r64037567
  
--- Diff: 
flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/asm/degree/annotate/TranslateEdgeDegreeToIntValue.java
 ---
@@ -0,0 +1,51 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.graph.asm.degree.annotate;
+
+import org.apache.flink.api.java.tuple.Tuple2;
+import org.apache.flink.graph.asm.translate.TranslateFunction;
+import org.apache.flink.types.IntValue;
+import org.apache.flink.types.LongValue;
+
+/**
+ * Translate the edge degree returned by the degree annotation functions 
from
+ * {@link LongValue} to {@link IntValue}.
+ *
+ * @param  edge value type
+ */
+public class TranslateEdgeDegreeToIntValue
--- End diff --

I thought on this for a while, and decided that the best option is to 
subsume the translation in the next operation, and discovered that I had 
already made that change. So the class just needs to be removed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (FLINK-3944) Add optimization rules to reorder Cartesian products and joins

2016-05-20 Thread Fabian Hueske (JIRA)
Fabian Hueske created FLINK-3944:


 Summary: Add optimization rules to reorder Cartesian products and 
joins
 Key: FLINK-3944
 URL: https://issues.apache.org/jira/browse/FLINK-3944
 Project: Flink
  Issue Type: Bug
  Components: Table API
Affects Versions: 1.1.0
Reporter: Fabian Hueske
Priority: Critical
 Fix For: 1.1.0


Currently, we do not support the execution of Cartesian products. 
Because we do not optimize the order of joins (due to missing statistics), 
joins are executed in the order in which they are specified. This works well 
for the Table API, however it can be problematic in case of SQL queries where 
the order of tables in the FROM clause should not matter.

In case of SQL queries, it can happen that the optimized plan contains 
Cartesian products because joins are not reordered. If we add optimization 
rules that switch Cartesian products and joins, such situations can be resolved.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-3943) Add support for EXCEPT (set minus)

2016-05-20 Thread Fabian Hueske (JIRA)
Fabian Hueske created FLINK-3943:


 Summary: Add support for EXCEPT (set minus)
 Key: FLINK-3943
 URL: https://issues.apache.org/jira/browse/FLINK-3943
 Project: Flink
  Issue Type: New Feature
  Components: Table API
Affects Versions: 1.1.0
Reporter: Fabian Hueske
Priority: Minor


Currently, the Table API and SQL do not support EXCEPT.

EXCEPT can be executed as a coGroup on all fields that forwards records of the 
first input if the second input is empty.

In order to add support for EXCEPT to the Table API and SQL we need to:
- Implement a {{DataSetMinus}} class that translates an EXCEPT into a DataSet 
API program using a coGroup on all fields.
- Implement a {{DataSetMinusRule}} that translates a Calcite {{LogicalMinus}} 
into a {{DataSetMinus}}.
- Extend the Table API (and validation phase) to provide an except() method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-3942) Add support for INTERSECT

2016-05-20 Thread Fabian Hueske (JIRA)
Fabian Hueske created FLINK-3942:


 Summary: Add support for INTERSECT
 Key: FLINK-3942
 URL: https://issues.apache.org/jira/browse/FLINK-3942
 Project: Flink
  Issue Type: New Feature
  Components: Table API
Affects Versions: 1.1.0
Reporter: Fabian Hueske
Priority: Minor


Currently, the Table API and SQL do not support INTERSECT.

INTERSECT can be executed as join on all fields.

In order to add support for INTERSECT to the Table API and SQL we need to:
- Implement a {{DataSetIntersect}} class that translates an INTERSECT into a 
DataSet API program using a join on all fields.
- Implement a {{DataSetIntersectRule}} that translates a Calcite 
{{LogicalIntersect}} into a {{DataSetIntersect}}.
- Extend the Table API (and validation phase) to provide an intersect() method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-3941) Add support for UNION (with duplicate elimination)

2016-05-20 Thread Fabian Hueske (JIRA)
Fabian Hueske created FLINK-3941:


 Summary: Add support for UNION (with duplicate elimination)
 Key: FLINK-3941
 URL: https://issues.apache.org/jira/browse/FLINK-3941
 Project: Flink
  Issue Type: New Feature
  Components: Table API
Affects Versions: 1.1.0
Reporter: Fabian Hueske
Priority: Minor


Currently, only UNION ALL is supported by Table API and SQL.

UNION (with duplicate elimination) can be supported by applying a 
{{DataSet.distinct()}} after the union on all fields. This issue includes:

- Extending {{DataSetUnion}}
- Relaxing {{DataSetUnionRule}} to translated non-all unions.
- Extend the Table API with union() method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-3940) Add support for ORDER BY OFFSET FETCH

2016-05-20 Thread Fabian Hueske (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fabian Hueske updated FLINK-3940:
-
Priority: Minor  (was: Major)

> Add support for ORDER BY OFFSET FETCH
> -
>
> Key: FLINK-3940
> URL: https://issues.apache.org/jira/browse/FLINK-3940
> Project: Flink
>  Issue Type: New Feature
>  Components: Table API
>Affects Versions: 1.1.0
>Reporter: Fabian Hueske
>Priority: Minor
>
> Currently only ORDER BY without OFFSET and FETCH are supported.
> This issue tracks the effort to add support for OFFSET and FETCH and involves:
> - Implementing the execution strategy in `DataSetSort`
> - adapting the `DataSetSortRule` to support OFFSET and FETCH
> - extending the Table API and validation to support OFFSET and FETCH and 
> generate a corresponding RelNode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-3940) Add support for ORDER BY OFFSET FETCH

2016-05-20 Thread Fabian Hueske (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fabian Hueske updated FLINK-3940:
-
Issue Type: New Feature  (was: Bug)

> Add support for ORDER BY OFFSET FETCH
> -
>
> Key: FLINK-3940
> URL: https://issues.apache.org/jira/browse/FLINK-3940
> Project: Flink
>  Issue Type: New Feature
>  Components: Table API
>Affects Versions: 1.1.0
>Reporter: Fabian Hueske
>
> Currently only ORDER BY without OFFSET and FETCH are supported.
> This issue tracks the effort to add support for OFFSET and FETCH and involves:
> - Implementing the execution strategy in `DataSetSort`
> - adapting the `DataSetSortRule` to support OFFSET and FETCH
> - extending the Table API and validation to support OFFSET and FETCH and 
> generate a corresponding RelNode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-3940) Add support for ORDER BY OFFSET FETCH

2016-05-20 Thread Fabian Hueske (JIRA)
Fabian Hueske created FLINK-3940:


 Summary: Add support for ORDER BY OFFSET FETCH
 Key: FLINK-3940
 URL: https://issues.apache.org/jira/browse/FLINK-3940
 Project: Flink
  Issue Type: Bug
  Components: Table API
Affects Versions: 1.1.0
Reporter: Fabian Hueske


Currently only ORDER BY without OFFSET and FETCH are supported.
This issue tracks the effort to add support for OFFSET and FETCH and involves:

- Implementing the execution strategy in `DataSetSort`
- adapting the `DataSetSortRule` to support OFFSET and FETCH
- extending the Table API and validation to support OFFSET and FETCH and 
generate a corresponding RelNode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3475) DISTINCT aggregate function support

2016-05-20 Thread Fabian Hueske (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15293052#comment-15293052
 ] 

Fabian Hueske commented on FLINK-3475:
--

DISTINCT aggregates can be computed by sorting the reduce group on the distinct 
attribute (secondary sort) and not considering duplicate values. A first step 
would be to add support for a single distinct attribute (groups can only be 
primarily sorted on one attribute). 

In case of multiple distinct aggregates, we have to split the aggregation into 
several group reduce operators and join the result afterwards. The join can be 
done locally and in a streamed merge join (partitioning and sorting will be 
preserved).

> DISTINCT aggregate function support
> ---
>
> Key: FLINK-3475
> URL: https://issues.apache.org/jira/browse/FLINK-3475
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table API
>Reporter: Chengxiang Li
>Assignee: Chengxiang Li
>
> DISTINCT aggregate function may be able to reuse the aggregate function 
> instead of separate implementation, and let Flink runtime take care of 
> duplicate records.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3939) Prevent distinct aggregates and grouping sets from being translated

2016-05-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15293029#comment-15293029
 ] 

ASF GitHub Bot commented on FLINK-3939:
---

GitHub user fhueske opened a pull request:

https://github.com/apache/flink/pull/2014

[FLINK-3939] [tableAPI] Prevent translation of unsupported distinct 
aggregates and grouping sets.

PR fixes `DataSetAggregateRule` to not translate `LogicalAggregate` 
operators that include distinct aggregates or grouping sets. Such aggregations 
are currently not supported by `DataSetAggregate`.

- [X] General
  - The pull request references the related JIRA issue ("[FLINK-XXX] Jira 
title text")
  - The pull request addresses only one issue
  - Each commit in the PR has a meaningful commit message (including the 
JIRA id)

- [X] Documentation
  - Bug fix does not require docs

- [X] Tests & Build
  - Functionality added by the pull request is covered by tests
  - `mvn clean verify` has been executed successfully locally or a Travis 
build has passed


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/fhueske/flink tableDistAggs

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/2014.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2014


commit ed9d68dece81346978564bf4185e97f7a2cf6e74
Author: Fabian Hueske 
Date:   2016-05-19T22:14:00Z

[FLINK-3939] [tableAPI] Prevent translation of unsupported distinct 
aggregates and grouping sets.




> Prevent distinct aggregates and grouping sets from being translated
> ---
>
> Key: FLINK-3939
> URL: https://issues.apache.org/jira/browse/FLINK-3939
> Project: Flink
>  Issue Type: Bug
>  Components: Table API
>Reporter: Fabian Hueske
>Assignee: Fabian Hueske
> Fix For: 1.1.0
>
>
> Flink's SQL interface is currently not capable of executing distinct 
> aggregates and grouping sets.
> We need to prevent that queries with these operations are translated by 
> adapting the DataSetAggregateRule.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3939] [tableAPI] Prevent translation of...

2016-05-20 Thread fhueske
GitHub user fhueske opened a pull request:

https://github.com/apache/flink/pull/2014

[FLINK-3939] [tableAPI] Prevent translation of unsupported distinct 
aggregates and grouping sets.

PR fixes `DataSetAggregateRule` to not translate `LogicalAggregate` 
operators that include distinct aggregates or grouping sets. Such aggregations 
are currently not supported by `DataSetAggregate`.

- [X] General
  - The pull request references the related JIRA issue ("[FLINK-XXX] Jira 
title text")
  - The pull request addresses only one issue
  - Each commit in the PR has a meaningful commit message (including the 
JIRA id)

- [X] Documentation
  - Bug fix does not require docs

- [X] Tests & Build
  - Functionality added by the pull request is covered by tests
  - `mvn clean verify` has been executed successfully locally or a Travis 
build has passed


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/fhueske/flink tableDistAggs

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/2014.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2014


commit ed9d68dece81346978564bf4185e97f7a2cf6e74
Author: Fabian Hueske 
Date:   2016-05-19T22:14:00Z

[FLINK-3939] [tableAPI] Prevent translation of unsupported distinct 
aggregates and grouping sets.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3909) Maven Failsafe plugin may report SUCCESS on failed tests

2016-05-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15292998#comment-15292998
 ] 

ASF GitHub Bot commented on FLINK-3909:
---

Github user mxm commented on the pull request:

https://github.com/apache/flink/pull/2003#issuecomment-220545613
  
The Travis builds here also segfault in all RocksDB tests. I'm assuming 
these test failed silently in the past since I can't make any connection to the 
test plugin changes.


> Maven Failsafe plugin may report SUCCESS on failed tests
> 
>
> Key: FLINK-3909
> URL: https://issues.apache.org/jira/browse/FLINK-3909
> Project: Flink
>  Issue Type: Bug
>  Components: Build System
>Affects Versions: 1.1.0
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
> Fix For: 1.1.0
>
>
> The following build completed successfully on Travis but there are actually 
> test failures: https://travis-ci.org/apache/flink/jobs/129943398#L5402



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3909] update Maven Failsafe version

2016-05-20 Thread mxm
Github user mxm commented on the pull request:

https://github.com/apache/flink/pull/2003#issuecomment-220545613
  
The Travis builds here also segfault in all RocksDB tests. I'm assuming 
these test failed silently in the past since I can't make any connection to the 
test plugin changes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3933) Add an auto-type-extracting DeserializationSchema

2016-05-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15292966#comment-15292966
 ] 

ASF GitHub Bot commented on FLINK-3933:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/2010


> Add an auto-type-extracting DeserializationSchema
> -
>
> Key: FLINK-3933
> URL: https://issues.apache.org/jira/browse/FLINK-3933
> Project: Flink
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 1.0.0
>Reporter: Stephan Ewen
>Assignee: Stephan Ewen
> Fix For: 1.1.0
>
>
> When creating a {{DeserializationSchema}}, people need to manually worry 
> about how to provide the produced type's {{TypeInformation}}.
> We should add a base utility {{AbstractDeserializationSchema}} that provides 
> that automatically via the type extractor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3933] [streaming API] Add AbstractDeser...

2016-05-20 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/2010


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3650) Add maxBy/minBy to Scala DataSet API

2016-05-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15292881#comment-15292881
 ] 

ASF GitHub Bot commented on FLINK-3650:
---

Github user ramkrish86 commented on a diff in the pull request:

https://github.com/apache/flink/pull/1856#discussion_r6491
  
--- Diff: 
flink-java/src/main/java/org/apache/flink/api/java/functions/SelectByMinFunction.java
 ---
@@ -72,8 +73,8 @@ public T reduce(T value1, T value2) throws Exception {
for (int position : fields) {
// Save position of compared key
// Get both values - both implement comparable
-   Comparable comparable1 = 
value1.getFieldNotNull(position);
-   Comparable comparable2 = 
value2.getFieldNotNull(position);
+   Comparable comparable1 = 
((Tuple)value1).getFieldNotNull(position);
--- End diff --

@tillrohrmann  - I have done the needful updates and now I can see maxBy 
and minBy works with scala tuples. It took some time as was busy with other 
things but some how could find time to complete this. In the process learnt 
some scala too. Let me know what you think of this last  commit. 


> Add maxBy/minBy to Scala DataSet API
> 
>
> Key: FLINK-3650
> URL: https://issues.apache.org/jira/browse/FLINK-3650
> Project: Flink
>  Issue Type: Improvement
>  Components: Java API, Scala API
>Affects Versions: 1.1.0
>Reporter: Till Rohrmann
>Assignee: ramkrishna.s.vasudevan
>
> The stable Java DataSet API contains the API calls {{maxBy}} and {{minBy}}. 
> These methods are not supported by the Scala DataSet API. These methods 
> should be added in order to have a consistent API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: FLINK-3650 Add maxBy/minBy to Scala DataSet AP...

2016-05-20 Thread ramkrish86
Github user ramkrish86 commented on a diff in the pull request:

https://github.com/apache/flink/pull/1856#discussion_r6491
  
--- Diff: 
flink-java/src/main/java/org/apache/flink/api/java/functions/SelectByMinFunction.java
 ---
@@ -72,8 +73,8 @@ public T reduce(T value1, T value2) throws Exception {
for (int position : fields) {
// Save position of compared key
// Get both values - both implement comparable
-   Comparable comparable1 = 
value1.getFieldNotNull(position);
-   Comparable comparable2 = 
value2.getFieldNotNull(position);
+   Comparable comparable1 = 
((Tuple)value1).getFieldNotNull(position);
--- End diff --

@tillrohrmann  - I have done the needful updates and now I can see maxBy 
and minBy works with scala tuples. It took some time as was busy with other 
things but some how could find time to complete this. In the process learnt 
some scala too. Let me know what you think of this last  commit. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---