[jira] [Created] (FLINK-3947) Provide low level access to RocksDB state backend
Elias Levy created FLINK-3947: - Summary: Provide low level access to RocksDB state backend Key: FLINK-3947 URL: https://issues.apache.org/jira/browse/FLINK-3947 Project: Flink Issue Type: Improvement Components: state backends Affects Versions: 1.0.3 Reporter: Elias Levy The current state API is limiting and some implementations are not as efficient as they could be, particularly when working with large states. For instance, a ListState is append only. You cannot remove values from the list. And the RocksDBListState get() implementation reads all list values from RocksDB instead of returning an Iterable that only reads values as needed. Furthermore, RocksDB is an ordered KV store, yet there is no ordered map state API with an ability to iterate over the stored values in order. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-3922) Infinite recursion on TypeExtractor
[ https://issues.apache.org/jira/browse/FLINK-3922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15294193#comment-15294193 ] Flavio Pompermaier commented on FLINK-3922: --- I confirm that this patch fixed my test! However I'd add a test to be sure to not reintroduce this problem in the future Thanks Timo > Infinite recursion on TypeExtractor > --- > > Key: FLINK-3922 > URL: https://issues.apache.org/jira/browse/FLINK-3922 > Project: Flink > Issue Type: Bug > Components: Type Serialization System >Affects Versions: 1.0.2 >Reporter: Flavio Pompermaier >Assignee: Timo Walther >Priority: Critical > > This program cause a StackOverflow (infinite recursion) in the TypeExtractor: > {code:title=TypeSerializerStackOverflowOnRecursivePojo.java|borderStyle=solid} > public class TypeSerializerStackOverflowOnRecursivePojo { > public static class RecursivePojoimplements Serializable { > private static final long serialVersionUID = 1L; > > private RecursivePojo parent; > public RecursivePojo(){} > public RecursivePojo(K k, V v) { > } > public RecursivePojo getParent() { > return parent; > } > public void setParent(RecursivePojo parent) { > this.parent = parent; > } > > } > public static class TypedTuple extends Tuple3 RecursivePojo >>{ > private static final long serialVersionUID = 1L; > } > > public static void main(String[] args) throws Exception { > ExecutionEnvironment env = > ExecutionEnvironment.getExecutionEnvironment(); > env.fromCollection(Arrays.asList(new RecursivePojo Map >("test",new HashMap ( > .map(t-> {TypedTuple ret = new TypedTuple();ret.setFields("1", > "1", t);return ret;}).returns(TypedTuple.class) > .print(); > } > > } > {code} > The thrown Exception is the following: > {code:title=Exception thrown} > Exception in thread "main" java.lang.StackOverflowError > at > sun.reflect.generics.parser.SignatureParser.parsePackageNameAndSimpleClassTypeSignature(SignatureParser.java:328) > at > sun.reflect.generics.parser.SignatureParser.parseClassTypeSignature(SignatureParser.java:310) > at > sun.reflect.generics.parser.SignatureParser.parseFieldTypeSignature(SignatureParser.java:289) > at > sun.reflect.generics.parser.SignatureParser.parseFieldTypeSignature(SignatureParser.java:283) > at > sun.reflect.generics.parser.SignatureParser.parseTypeSignature(SignatureParser.java:485) > at > sun.reflect.generics.parser.SignatureParser.parseReturnType(SignatureParser.java:627) > at > sun.reflect.generics.parser.SignatureParser.parseMethodTypeSignature(SignatureParser.java:577) > at > sun.reflect.generics.parser.SignatureParser.parseMethodSig(SignatureParser.java:171) > at > sun.reflect.generics.repository.ConstructorRepository.parse(ConstructorRepository.java:55) > at > sun.reflect.generics.repository.ConstructorRepository.parse(ConstructorRepository.java:43) > at > sun.reflect.generics.repository.AbstractRepository.(AbstractRepository.java:74) > at > sun.reflect.generics.repository.GenericDeclRepository.(GenericDeclRepository.java:49) > at > sun.reflect.generics.repository.ConstructorRepository.(ConstructorRepository.java:51) > at > sun.reflect.generics.repository.MethodRepository.(MethodRepository.java:46) > at > sun.reflect.generics.repository.MethodRepository.make(MethodRepository.java:59) > at java.lang.reflect.Method.getGenericInfo(Method.java:102) > at java.lang.reflect.Method.getGenericReturnType(Method.java:255) > at > org.apache.flink.api.java.typeutils.TypeExtractor.isValidPojoField(TypeExtractor.java:1610) > at > org.apache.flink.api.java.typeutils.TypeExtractor.analyzePojo(TypeExtractor.java:1671) > at > org.apache.flink.api.java.typeutils.TypeExtractor.privateGetForClass(TypeExtractor.java:1559) > at > org.apache.flink.api.java.typeutils.TypeExtractor.createTypeInfoWithTypeHierarchy(TypeExtractor.java:732) > at > org.apache.flink.api.java.typeutils.TypeExtractor.analyzePojo(TypeExtractor.java:1678) > at > org.apache.flink.api.java.typeutils.TypeExtractor.privateGetForClass(TypeExtractor.java:1559) > at > org.apache.flink.api.java.typeutils.TypeExtractor.createTypeInfoWithTypeHierarchy(TypeExtractor.java:732) > at > org.apache.flink.api.java.typeutils.TypeExtractor.analyzePojo(TypeExtractor.java:1678) > at >
[jira] [Created] (FLINK-3946) State API Should Support Data Expiration
Elias Levy created FLINK-3946: - Summary: State API Should Support Data Expiration Key: FLINK-3946 URL: https://issues.apache.org/jira/browse/FLINK-3946 Project: Flink Issue Type: Improvement Components: state backends Affects Versions: 1.0.3 Reporter: Elias Levy The state API should support data expiration. Consider a custom data stream operator that operates on a keyed stream and maintains a cache of items in its keyed state. The operator can expire items in the state based on event time or processing time. But as the state is keyed, if the operator is never invoked again for a previously observed key, the state associated with that key will never be expired, leading the overall job state to grow indefinitely. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-3928) Potential overflow due to 32-bit int arithmetic
[ https://issues.apache.org/jira/browse/FLINK-3928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Hogan closed FLINK-3928. - Resolution: Fixed Fixed in 723766993ce659f7316c890fcce7d0633957e448 > Potential overflow due to 32-bit int arithmetic > --- > > Key: FLINK-3928 > URL: https://issues.apache.org/jira/browse/FLINK-3928 > Project: Flink > Issue Type: Bug >Reporter: Ted Yu >Assignee: Greg Hogan >Priority: Minor > Fix For: 1.1.0 > > > The following pattern occurs in both LocalClusteringCoefficient.java and > TriangleListing.java : > {code} > int scale = parameters.getInt("scale", DEFAULT_SCALE); > ... > long vertexCount = 1 << scale; > {code} > "1 << scale" may overflow with type "int" due to the use of 32-bit arithmetic -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-3780) Jaccard Similarity
[ https://issues.apache.org/jira/browse/FLINK-3780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Hogan closed FLINK-3780. - Resolution: Fixed Implemented in 8ed368582a8550ccb5941dfd4e8412dad8ca34c0 > Jaccard Similarity > -- > > Key: FLINK-3780 > URL: https://issues.apache.org/jira/browse/FLINK-3780 > Project: Flink > Issue Type: New Feature > Components: Gelly >Affects Versions: 1.1.0 >Reporter: Greg Hogan >Assignee: Greg Hogan > Fix For: 1.1.0 > > > Implement a Jaccard Similarity algorithm computing all non-zero similarity > scores. This algorithm is similar to {{TriangleListing}} but instead of > joining two-paths against an edge list we count two-paths. > {{flink-gelly-examples}} currently has {{JaccardSimilarityMeasure}} which > relies on {{Graph.getTriplets()}} so only computes similarity scores for > neighbors but not neighbors-of-neighbors. > This algorithm is easily modified for other similarity scores such as > Adamic-Adar similarity where the sum of endpoint degrees is replaced by the > degree of the middle vertex. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-3928) Potential overflow due to 32-bit int arithmetic
[ https://issues.apache.org/jira/browse/FLINK-3928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15294118#comment-15294118 ] ASF GitHub Bot commented on FLINK-3928: --- Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/2006 > Potential overflow due to 32-bit int arithmetic > --- > > Key: FLINK-3928 > URL: https://issues.apache.org/jira/browse/FLINK-3928 > Project: Flink > Issue Type: Bug >Reporter: Ted Yu >Assignee: Greg Hogan >Priority: Minor > Fix For: 1.1.0 > > > The following pattern occurs in both LocalClusteringCoefficient.java and > TriangleListing.java : > {code} > int scale = parameters.getInt("scale", DEFAULT_SCALE); > ... > long vertexCount = 1 << scale; > {code} > "1 << scale" may overflow with type "int" due to the use of 32-bit arithmetic -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-3780) Jaccard Similarity
[ https://issues.apache.org/jira/browse/FLINK-3780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15294119#comment-15294119 ] ASF GitHub Bot commented on FLINK-3780: --- Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/1980 > Jaccard Similarity > -- > > Key: FLINK-3780 > URL: https://issues.apache.org/jira/browse/FLINK-3780 > Project: Flink > Issue Type: New Feature > Components: Gelly >Affects Versions: 1.1.0 >Reporter: Greg Hogan >Assignee: Greg Hogan > Fix For: 1.1.0 > > > Implement a Jaccard Similarity algorithm computing all non-zero similarity > scores. This algorithm is similar to {{TriangleListing}} but instead of > joining two-paths against an edge list we count two-paths. > {{flink-gelly-examples}} currently has {{JaccardSimilarityMeasure}} which > relies on {{Graph.getTriplets()}} so only computes similarity scores for > neighbors but not neighbors-of-neighbors. > This algorithm is easily modified for other similarity scores such as > Adamic-Adar similarity where the sum of endpoint degrees is replaced by the > degree of the middle vertex. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-3928] [gelly] Potential overflow due to...
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/2006 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-3780] [gelly] Jaccard Similarity
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/1980 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Created] (FLINK-3945) Degree annotation for directed graphs
Greg Hogan created FLINK-3945: - Summary: Degree annotation for directed graphs Key: FLINK-3945 URL: https://issues.apache.org/jira/browse/FLINK-3945 Project: Flink Issue Type: Improvement Components: Gelly Affects Versions: 1.1.0 Reporter: Greg Hogan Assignee: Greg Hogan Priority: Minor Fix For: 1.1.0 There is a third degree count for vertices in directed graphs which is the distinct count of out- and in-neighbors. This also adds edge annotation of the vertex degrees for directed graphs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-3928) Potential overflow due to 32-bit int arithmetic
[ https://issues.apache.org/jira/browse/FLINK-3928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15293692#comment-15293692 ] ASF GitHub Bot commented on FLINK-3928: --- Github user greghogan commented on the pull request: https://github.com/apache/flink/pull/2006#issuecomment-220658635 Merging ... > Potential overflow due to 32-bit int arithmetic > --- > > Key: FLINK-3928 > URL: https://issues.apache.org/jira/browse/FLINK-3928 > Project: Flink > Issue Type: Bug >Reporter: Ted Yu >Assignee: Greg Hogan >Priority: Minor > Fix For: 1.1.0 > > > The following pattern occurs in both LocalClusteringCoefficient.java and > TriangleListing.java : > {code} > int scale = parameters.getInt("scale", DEFAULT_SCALE); > ... > long vertexCount = 1 << scale; > {code} > "1 << scale" may overflow with type "int" due to the use of 32-bit arithmetic -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-3780) Jaccard Similarity
[ https://issues.apache.org/jira/browse/FLINK-3780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15293698#comment-15293698 ] ASF GitHub Bot commented on FLINK-3780: --- Github user greghogan commented on the pull request: https://github.com/apache/flink/pull/1980#issuecomment-220658718 Thanks for the reviews @vasia! Merging ... > Jaccard Similarity > -- > > Key: FLINK-3780 > URL: https://issues.apache.org/jira/browse/FLINK-3780 > Project: Flink > Issue Type: New Feature > Components: Gelly >Affects Versions: 1.1.0 >Reporter: Greg Hogan >Assignee: Greg Hogan > Fix For: 1.1.0 > > > Implement a Jaccard Similarity algorithm computing all non-zero similarity > scores. This algorithm is similar to {{TriangleListing}} but instead of > joining two-paths against an edge list we count two-paths. > {{flink-gelly-examples}} currently has {{JaccardSimilarityMeasure}} which > relies on {{Graph.getTriplets()}} so only computes similarity scores for > neighbors but not neighbors-of-neighbors. > This algorithm is easily modified for other similarity scores such as > Adamic-Adar similarity where the sum of endpoint degrees is replaced by the > degree of the middle vertex. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-3928] [gelly] Potential overflow due to...
Github user greghogan commented on the pull request: https://github.com/apache/flink/pull/2006#issuecomment-220658635 Merging ... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2044) Implementation of Gelly HITS Algorithm
[ https://issues.apache.org/jira/browse/FLINK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15293669#comment-15293669 ] ASF GitHub Bot commented on FLINK-2044: --- Github user greghogan commented on the pull request: https://github.com/apache/flink/pull/1956#issuecomment-220656004 Stanford provides several old graph datasets at https://snap.stanford.edu/data/index.html which might prove a better standard for benchmarking. > Implementation of Gelly HITS Algorithm > -- > > Key: FLINK-2044 > URL: https://issues.apache.org/jira/browse/FLINK-2044 > Project: Flink > Issue Type: New Feature > Components: Gelly >Reporter: Ahamd Javid >Assignee: GaoLun >Priority: Minor > > Implementation of Hits Algorithm in Gelly API using Java. the feature branch > can be found here: (https://github.com/JavidMayar/flink/commits/HITS) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2044] [gelly] Implementation of Gelly H...
Github user greghogan commented on the pull request: https://github.com/apache/flink/pull/1956#issuecomment-220656004 Stanford provides several old graph datasets at https://snap.stanford.edu/data/index.html which might prove a better standard for benchmarking. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2044) Implementation of Gelly HITS Algorithm
[ https://issues.apache.org/jira/browse/FLINK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15293667#comment-15293667 ] ASF GitHub Bot commented on FLINK-2044: --- Github user greghogan commented on a diff in the pull request: https://github.com/apache/flink/pull/1956#discussion_r64070513 --- Diff: flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/HITSAlgorithm.java --- @@ -273,4 +289,23 @@ public void sendMessages(Vertex> vertex) { return initVertexValue; } } + + public static class AuthorityEdgeMapper implements MapFunction , String> { --- End diff -- A boolean is stored as a byte as would a bitmask which allows the edge set to be compressed on bidirectional edges. Not sure how common bidirectional edges are in real-world datasets. > Implementation of Gelly HITS Algorithm > -- > > Key: FLINK-2044 > URL: https://issues.apache.org/jira/browse/FLINK-2044 > Project: Flink > Issue Type: New Feature > Components: Gelly >Reporter: Ahamd Javid >Assignee: GaoLun >Priority: Minor > > Implementation of Hits Algorithm in Gelly API using Java. the feature branch > can be found here: (https://github.com/JavidMayar/flink/commits/HITS) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2044] [gelly] Implementation of Gelly H...
Github user greghogan commented on a diff in the pull request: https://github.com/apache/flink/pull/1956#discussion_r64070513 --- Diff: flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/HITSAlgorithm.java --- @@ -273,4 +289,23 @@ public void sendMessages(Vertex> vertex) { return initVertexValue; } } + + public static class AuthorityEdgeMapper implements MapFunction , String> { --- End diff -- A boolean is stored as a byte as would a bitmask which allows the edge set to be compressed on bidirectional edges. Not sure how common bidirectional edges are in real-world datasets. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2144) Incremental count, average, and variance for windows
[ https://issues.apache.org/jira/browse/FLINK-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15293629#comment-15293629 ] Gabor Gevay commented on FLINK-2144: But why couldn't we provide FoldFunctions for these in the API? > Incremental count, average, and variance for windows > > > Key: FLINK-2144 > URL: https://issues.apache.org/jira/browse/FLINK-2144 > Project: Flink > Issue Type: New Feature > Components: Streaming >Reporter: Gabor Gevay >Priority: Minor > Labels: statistics > > By count I mean the number of elements in the window. > These can be implemented very efficiently building on FLINK-2143: > Store: O(1) > Evict: O(1) > emitWindow: O(1) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2144) Incremental count, average, and variance for windows
[ https://issues.apache.org/jira/browse/FLINK-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15293609#comment-15293609 ] Aljoscha Krettek commented on FLINK-2144: - Right now we don't really have place in the doc where we would put such a thing. We could either add such a section or add an example to to the builtin Flink examples. > Incremental count, average, and variance for windows > > > Key: FLINK-2144 > URL: https://issues.apache.org/jira/browse/FLINK-2144 > Project: Flink > Issue Type: New Feature > Components: Streaming >Reporter: Gabor Gevay >Priority: Minor > Labels: statistics > > By count I mean the number of elements in the window. > These can be implemented very efficiently building on FLINK-2143: > Store: O(1) > Evict: O(1) > emitWindow: O(1) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2144) Incremental count, average, and variance for windows
[ https://issues.apache.org/jira/browse/FLINK-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15293601#comment-15293601 ] Trevor Grant commented on FLINK-2144: - Implement as a convenience functions and call out in the docs that custom implementations are probably more efficient? OR Update docs with a nice robust example showing how to do this. > Incremental count, average, and variance for windows > > > Key: FLINK-2144 > URL: https://issues.apache.org/jira/browse/FLINK-2144 > Project: Flink > Issue Type: New Feature > Components: Streaming >Reporter: Gabor Gevay >Priority: Minor > Labels: statistics > > By count I mean the number of elements in the window. > These can be implemented very efficiently building on FLINK-2143: > Store: O(1) > Evict: O(1) > emitWindow: O(1) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-3893) LeaderChangeStateCleanupTest times out
[ https://issues.apache.org/jira/browse/FLINK-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maximilian Michels closed FLINK-3893. - Resolution: Fixed Fix Version/s: 1.1.0 9b8de6a6cb3b1cc9ebeb623371e7fef3a6cb763d > LeaderChangeStateCleanupTest times out > -- > > Key: FLINK-3893 > URL: https://issues.apache.org/jira/browse/FLINK-3893 > Project: Flink > Issue Type: Bug > Components: Tests >Affects Versions: 1.1.0 >Reporter: Maximilian Michels >Assignee: Maximilian Michels >Priority: Minor > Labels: test-stability > Fix For: 1.1.0 > > > {{cluster.waitForTaskManagersToBeRegistered();}} needs to be replaced by > {{cluster.waitForTaskManagersToBeRegistered(timeout);}} > {noformat} > testStateCleanupAfterListenerNotification(org.apache.flink.runtime.leaderelection.LeaderChangeStateCleanupTest) > Time elapsed: 10.106 sec <<< ERROR! > java.util.concurrent.TimeoutException: Futures timed out after [1 > milliseconds] > at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) > at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:153) > at scala.concurrent.Await$$anonfun$ready$1.apply(package.scala:86) > at scala.concurrent.Await$$anonfun$ready$1.apply(package.scala:86) > at > scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) > at scala.concurrent.Await$.ready(package.scala:86) > at > org.apache.flink.runtime.minicluster.FlinkMiniCluster.waitForTaskManagersToBeRegistered(FlinkMiniCluster.scala:455) > at > org.apache.flink.runtime.minicluster.FlinkMiniCluster.waitForTaskManagersToBeRegistered(FlinkMiniCluster.scala:439) > at > org.apache.flink.runtime.leaderelection.LeaderChangeStateCleanupTest.testStateCleanupAfterListenerNotification(LeaderChangeStateCleanupTest.java:181) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-3938) Yarn tests don't run on the current master
[ https://issues.apache.org/jira/browse/FLINK-3938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15293572#comment-15293572 ] ASF GitHub Bot commented on FLINK-3938: --- Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/2012 > Yarn tests don't run on the current master > -- > > Key: FLINK-3938 > URL: https://issues.apache.org/jira/browse/FLINK-3938 > Project: Flink > Issue Type: Bug > Components: Build System >Affects Versions: 1.1.0 >Reporter: Maximilian Michels >Assignee: Maximilian Michels >Priority: Critical > Fix For: 1.1.0 > > > Independently of FLINK-3909, I just discovered that the Yarn tests don't run > on the current master (09b428b). > {noformat} > [INFO] > > [INFO] Building flink-yarn-tests 1.1-SNAPSHOT > [INFO] > > [INFO] > [INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ > flink-yarn-tests_2.10 --- > [INFO] > [INFO] --- maven-checkstyle-plugin:2.16:check (validate) @ > flink-yarn-tests_2.10 --- > [INFO] > [INFO] --- maven-enforcer-plugin:1.3.1:enforce (enforce-maven) @ > flink-yarn-tests_2.10 --- > [INFO] > [INFO] --- build-helper-maven-plugin:1.7:add-source (add-source) @ > flink-yarn-tests_2.10 --- > [INFO] Source directory: > /home/travis/build/apache/flink/flink-yarn-tests/src/main/scala added. > [INFO] > [INFO] --- maven-remote-resources-plugin:1.5:process (default) @ > flink-yarn-tests_2.10 --- > [INFO] > [INFO] --- maven-resources-plugin:2.6:resources (default-resources) @ > flink-yarn-tests_2.10 --- > [INFO] Using 'UTF-8' encoding to copy filtered resources. > [INFO] skip non existing resourceDirectory > /home/travis/build/apache/flink/flink-yarn-tests/src/main/resources > [INFO] Copying 3 resources > [INFO] > [INFO] --- scala-maven-plugin:3.1.4:compile (scala-compile-first) @ > flink-yarn-tests_2.10 --- > [INFO] No sources to compile > [INFO] > [INFO] --- maven-compiler-plugin:3.1:compile (default-compile) @ > flink-yarn-tests_2.10 --- > [INFO] No sources to compile > [INFO] > [INFO] --- build-helper-maven-plugin:1.7:add-test-source (add-test-source) @ > flink-yarn-tests_2.10 --- > [INFO] Test Source directory: > /home/travis/build/apache/flink/flink-yarn-tests/src/test/scala added. > [INFO] > [INFO] --- maven-resources-plugin:2.6:testResources (default-testResources) @ > flink-yarn-tests_2.10 --- > [INFO] Using 'UTF-8' encoding to copy filtered resources. > [INFO] Copying 1 resource > [INFO] Copying 3 resources > [INFO] > [INFO] --- scala-maven-plugin:3.1.4:testCompile (scala-test-compile) @ > flink-yarn-tests_2.10 --- > [INFO] /home/travis/build/apache/flink/flink-yarn-tests/src/test/scala:-1: > info: compiling > [INFO] Compiling 2 source files to > /home/travis/build/apache/flink/flink-yarn-tests/target/test-classes at > 1463615798796 > [INFO] prepare-compile in 0 s > [INFO] compile in 9 s > [INFO] > [INFO] --- maven-compiler-plugin:3.1:testCompile (default-testCompile) @ > flink-yarn-tests_2.10 --- > [INFO] Nothing to compile - all classes are up to date > [INFO] > [INFO] --- maven-surefire-plugin:2.18.1:test (default-test) @ > flink-yarn-tests_2.10 --- > [INFO] Surefire report directory: > /home/travis/build/apache/flink/flink-yarn-tests/target/surefire-reports > [WARNING] The system property log4j.configuration is configured twice! The > property appears in and any of , > or user property. > --- > T E S T S > --- > Results : > Tests run: 0, Failures: 0, Errors: 0, Skipped: 0 > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-3909) Maven Failsafe plugin may report SUCCESS on failed tests
[ https://issues.apache.org/jira/browse/FLINK-3909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15293576#comment-15293576 ] ASF GitHub Bot commented on FLINK-3909: --- Github user mxm commented on the pull request: https://github.com/apache/flink/pull/2003#issuecomment-220642391 Merging...we need to fix the RocksDB issue afterwards. In the worst case, we'll comment out the relevant test cases for now. The test which seems to cause the segfaults is `EventTimeWindowCheckpointingITCase`. > Maven Failsafe plugin may report SUCCESS on failed tests > > > Key: FLINK-3909 > URL: https://issues.apache.org/jira/browse/FLINK-3909 > Project: Flink > Issue Type: Bug > Components: Build System >Affects Versions: 1.1.0 >Reporter: Maximilian Michels >Assignee: Maximilian Michels > Fix For: 1.1.0 > > > The following build completed successfully on Travis but there are actually > test failures: https://travis-ci.org/apache/flink/jobs/129943398#L5402 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-3893) LeaderChangeStateCleanupTest times out
[ https://issues.apache.org/jira/browse/FLINK-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15293571#comment-15293571 ] ASF GitHub Bot commented on FLINK-3893: --- Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/2009 > LeaderChangeStateCleanupTest times out > -- > > Key: FLINK-3893 > URL: https://issues.apache.org/jira/browse/FLINK-3893 > Project: Flink > Issue Type: Bug > Components: Tests >Affects Versions: 1.1.0 >Reporter: Maximilian Michels >Assignee: Maximilian Michels >Priority: Minor > Labels: test-stability > Fix For: 1.1.0 > > > {{cluster.waitForTaskManagersToBeRegistered();}} needs to be replaced by > {{cluster.waitForTaskManagersToBeRegistered(timeout);}} > {noformat} > testStateCleanupAfterListenerNotification(org.apache.flink.runtime.leaderelection.LeaderChangeStateCleanupTest) > Time elapsed: 10.106 sec <<< ERROR! > java.util.concurrent.TimeoutException: Futures timed out after [1 > milliseconds] > at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) > at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:153) > at scala.concurrent.Await$$anonfun$ready$1.apply(package.scala:86) > at scala.concurrent.Await$$anonfun$ready$1.apply(package.scala:86) > at > scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) > at scala.concurrent.Await$.ready(package.scala:86) > at > org.apache.flink.runtime.minicluster.FlinkMiniCluster.waitForTaskManagersToBeRegistered(FlinkMiniCluster.scala:455) > at > org.apache.flink.runtime.minicluster.FlinkMiniCluster.waitForTaskManagersToBeRegistered(FlinkMiniCluster.scala:439) > at > org.apache.flink.runtime.leaderelection.LeaderChangeStateCleanupTest.testStateCleanupAfterListenerNotification(LeaderChangeStateCleanupTest.java:181) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-3909) Maven Failsafe plugin may report SUCCESS on failed tests
[ https://issues.apache.org/jira/browse/FLINK-3909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15293573#comment-15293573 ] ASF GitHub Bot commented on FLINK-3909: --- Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/2003 > Maven Failsafe plugin may report SUCCESS on failed tests > > > Key: FLINK-3909 > URL: https://issues.apache.org/jira/browse/FLINK-3909 > Project: Flink > Issue Type: Bug > Components: Build System >Affects Versions: 1.1.0 >Reporter: Maximilian Michels >Assignee: Maximilian Michels > Fix For: 1.1.0 > > > The following build completed successfully on Travis but there are actually > test failures: https://travis-ci.org/apache/flink/jobs/129943398#L5402 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-3892) ConnectionUtils may die with NullPointerException
[ https://issues.apache.org/jira/browse/FLINK-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15293570#comment-15293570 ] ASF GitHub Bot commented on FLINK-3892: --- Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/2008 > ConnectionUtils may die with NullPointerException > - > > Key: FLINK-3892 > URL: https://issues.apache.org/jira/browse/FLINK-3892 > Project: Flink > Issue Type: Bug > Components: YARN Client >Affects Versions: 1.1.0, 1.0.3 >Reporter: Maximilian Michels >Assignee: Maximilian Michels >Priority: Minor > Fix For: 1.1.0 > > > If an invalid hostname is specificed or the hostname can't be resolved from > the current interface, {{ConnectionUtils.findAddressUsingStrategy}} may throw > a {{NullPointerException}}. When trying to access the {{InetAddress}} of an > {{InetSocketAddress}}, null is returned when the host could not been resolved. > The solution is to abort the attempt to find the local address if the host > couldn't be resolved. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-3909] update Maven Failsafe version
Github user mxm commented on the pull request: https://github.com/apache/flink/pull/2003#issuecomment-220642391 Merging...we need to fix the RocksDB issue afterwards. In the worst case, we'll comment out the relevant test cases for now. The test which seems to cause the segfaults is `EventTimeWindowCheckpointingITCase`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Closed] (FLINK-3938) Yarn tests don't run on the current master
[ https://issues.apache.org/jira/browse/FLINK-3938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maximilian Michels closed FLINK-3938. - Resolution: Fixed 9a4fdd5fb3a7097035e74b0bf685553bbfdf7f43 > Yarn tests don't run on the current master > -- > > Key: FLINK-3938 > URL: https://issues.apache.org/jira/browse/FLINK-3938 > Project: Flink > Issue Type: Bug > Components: Build System >Affects Versions: 1.1.0 >Reporter: Maximilian Michels >Assignee: Maximilian Michels >Priority: Critical > Fix For: 1.1.0 > > > Independently of FLINK-3909, I just discovered that the Yarn tests don't run > on the current master (09b428b). > {noformat} > [INFO] > > [INFO] Building flink-yarn-tests 1.1-SNAPSHOT > [INFO] > > [INFO] > [INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ > flink-yarn-tests_2.10 --- > [INFO] > [INFO] --- maven-checkstyle-plugin:2.16:check (validate) @ > flink-yarn-tests_2.10 --- > [INFO] > [INFO] --- maven-enforcer-plugin:1.3.1:enforce (enforce-maven) @ > flink-yarn-tests_2.10 --- > [INFO] > [INFO] --- build-helper-maven-plugin:1.7:add-source (add-source) @ > flink-yarn-tests_2.10 --- > [INFO] Source directory: > /home/travis/build/apache/flink/flink-yarn-tests/src/main/scala added. > [INFO] > [INFO] --- maven-remote-resources-plugin:1.5:process (default) @ > flink-yarn-tests_2.10 --- > [INFO] > [INFO] --- maven-resources-plugin:2.6:resources (default-resources) @ > flink-yarn-tests_2.10 --- > [INFO] Using 'UTF-8' encoding to copy filtered resources. > [INFO] skip non existing resourceDirectory > /home/travis/build/apache/flink/flink-yarn-tests/src/main/resources > [INFO] Copying 3 resources > [INFO] > [INFO] --- scala-maven-plugin:3.1.4:compile (scala-compile-first) @ > flink-yarn-tests_2.10 --- > [INFO] No sources to compile > [INFO] > [INFO] --- maven-compiler-plugin:3.1:compile (default-compile) @ > flink-yarn-tests_2.10 --- > [INFO] No sources to compile > [INFO] > [INFO] --- build-helper-maven-plugin:1.7:add-test-source (add-test-source) @ > flink-yarn-tests_2.10 --- > [INFO] Test Source directory: > /home/travis/build/apache/flink/flink-yarn-tests/src/test/scala added. > [INFO] > [INFO] --- maven-resources-plugin:2.6:testResources (default-testResources) @ > flink-yarn-tests_2.10 --- > [INFO] Using 'UTF-8' encoding to copy filtered resources. > [INFO] Copying 1 resource > [INFO] Copying 3 resources > [INFO] > [INFO] --- scala-maven-plugin:3.1.4:testCompile (scala-test-compile) @ > flink-yarn-tests_2.10 --- > [INFO] /home/travis/build/apache/flink/flink-yarn-tests/src/test/scala:-1: > info: compiling > [INFO] Compiling 2 source files to > /home/travis/build/apache/flink/flink-yarn-tests/target/test-classes at > 1463615798796 > [INFO] prepare-compile in 0 s > [INFO] compile in 9 s > [INFO] > [INFO] --- maven-compiler-plugin:3.1:testCompile (default-testCompile) @ > flink-yarn-tests_2.10 --- > [INFO] Nothing to compile - all classes are up to date > [INFO] > [INFO] --- maven-surefire-plugin:2.18.1:test (default-test) @ > flink-yarn-tests_2.10 --- > [INFO] Surefire report directory: > /home/travis/build/apache/flink/flink-yarn-tests/target/surefire-reports > [WARNING] The system property log4j.configuration is configured twice! The > property appears in and any of , > or user property. > --- > T E S T S > --- > Results : > Tests run: 0, Failures: 0, Errors: 0, Skipped: 0 > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-3892) ConnectionUtils may die with NullPointerException
[ https://issues.apache.org/jira/browse/FLINK-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maximilian Michels closed FLINK-3892. - Resolution: Fixed 5dfb8a013cbf769bee641c04ebc666a8dc2f3e14 > ConnectionUtils may die with NullPointerException > - > > Key: FLINK-3892 > URL: https://issues.apache.org/jira/browse/FLINK-3892 > Project: Flink > Issue Type: Bug > Components: YARN Client >Affects Versions: 1.1.0, 1.0.3 >Reporter: Maximilian Michels >Assignee: Maximilian Michels >Priority: Minor > Fix For: 1.1.0 > > > If an invalid hostname is specificed or the hostname can't be resolved from > the current interface, {{ConnectionUtils.findAddressUsingStrategy}} may throw > a {{NullPointerException}}. When trying to access the {{InetAddress}} of an > {{InetSocketAddress}}, null is returned when the host could not been resolved. > The solution is to abort the attempt to find the local address if the host > couldn't be resolved. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-3938] re-enable Yarn tests
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/2012 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Closed] (FLINK-3909) Maven Failsafe plugin may report SUCCESS on failed tests
[ https://issues.apache.org/jira/browse/FLINK-3909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maximilian Michels closed FLINK-3909. - Resolution: Fixed 38698c0b101cbb48f8c10adf4060983ac07e2f4b > Maven Failsafe plugin may report SUCCESS on failed tests > > > Key: FLINK-3909 > URL: https://issues.apache.org/jira/browse/FLINK-3909 > Project: Flink > Issue Type: Bug > Components: Build System >Affects Versions: 1.1.0 >Reporter: Maximilian Michels >Assignee: Maximilian Michels > Fix For: 1.1.0 > > > The following build completed successfully on Travis but there are actually > test failures: https://travis-ci.org/apache/flink/jobs/129943398#L5402 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-3893] improve LeaderChangeStateCleanupT...
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/2009 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-3892] ConnectionUtils may die with Null...
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/2008 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-3909] update Maven Failsafe version
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/2003 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2144) Incremental count, average, and variance for windows
[ https://issues.apache.org/jira/browse/FLINK-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15293562#comment-15293562 ] Aljoscha Krettek commented on FLINK-2144: - I think this can easily be done by users in a window with a {{FoldFunction}}. Doing it in a generic way as in the aggregation functions on {{WindowedStream}} will never be as fast as a custom implementation. I would vote to close this issue. > Incremental count, average, and variance for windows > > > Key: FLINK-2144 > URL: https://issues.apache.org/jira/browse/FLINK-2144 > Project: Flink > Issue Type: New Feature > Components: Streaming >Reporter: Gabor Gevay >Priority: Minor > Labels: statistics > > By count I mean the number of elements in the window. > These can be implemented very efficiently building on FLINK-2143: > Store: O(1) > Evict: O(1) > emitWindow: O(1) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2044] [gelly] Implementation of Gelly H...
Github user gallenvara commented on a diff in the pull request: https://github.com/apache/flink/pull/1956#discussion_r64056511 --- Diff: flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/HITSAlgorithm.java --- @@ -273,4 +289,23 @@ public void sendMessages(Vertex> vertex) { return initVertexValue; } } + + public static class AuthorityEdgeMapper implements MapFunction , String> { --- End diff -- I mean that they had a small difference in running time. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2044) Implementation of Gelly HITS Algorithm
[ https://issues.apache.org/jira/browse/FLINK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15293519#comment-15293519 ] ASF GitHub Bot commented on FLINK-2044: --- Github user gallenvara commented on the pull request: https://github.com/apache/flink/pull/1956#issuecomment-220633090 @vasia vertex num: 1, edge num: 3; vertex num: 3, edge num: 10. > Implementation of Gelly HITS Algorithm > -- > > Key: FLINK-2044 > URL: https://issues.apache.org/jira/browse/FLINK-2044 > Project: Flink > Issue Type: New Feature > Components: Gelly >Reporter: Ahamd Javid >Assignee: GaoLun >Priority: Minor > > Implementation of Hits Algorithm in Gelly API using Java. the feature branch > can be found here: (https://github.com/JavidMayar/flink/commits/HITS) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2044] [gelly] Implementation of Gelly H...
Github user gallenvara commented on the pull request: https://github.com/apache/flink/pull/1956#issuecomment-220633090 @vasia vertex num: 1, edge num: 3; vertex num: 3, edge num: 10. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2044) Implementation of Gelly HITS Algorithm
[ https://issues.apache.org/jira/browse/FLINK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15293511#comment-15293511 ] ASF GitHub Bot commented on FLINK-2044: --- Github user gallenvara commented on a diff in the pull request: https://github.com/apache/flink/pull/1956#discussion_r64056511 --- Diff: flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/HITSAlgorithm.java --- @@ -273,4 +289,23 @@ public void sendMessages(Vertex> vertex) { return initVertexValue; } } + + public static class AuthorityEdgeMapper implements MapFunction , String> { --- End diff -- I mean that they had a small difference in running time. > Implementation of Gelly HITS Algorithm > -- > > Key: FLINK-2044 > URL: https://issues.apache.org/jira/browse/FLINK-2044 > Project: Flink > Issue Type: New Feature > Components: Gelly >Reporter: Ahamd Javid >Assignee: GaoLun >Priority: Minor > > Implementation of Hits Algorithm in Gelly API using Java. the feature branch > can be found here: (https://github.com/JavidMayar/flink/commits/HITS) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2044) Implementation of Gelly HITS Algorithm
[ https://issues.apache.org/jira/browse/FLINK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15293501#comment-15293501 ] ASF GitHub Bot commented on FLINK-2044: --- Github user gallenvara commented on a diff in the pull request: https://github.com/apache/flink/pull/1956#discussion_r64056167 --- Diff: flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/HITSAlgorithm.java --- @@ -243,18 +255,22 @@ public void sendMessages(Vertex> vertex) { iterationValueSum = Math.sqrt(((DoubleValue) getPreviousIterationAggregate("updatedValueSum")).getValue()); } for (Edge edge : getEdges()) { - K messageSource = getSuperstepNumber() % 2 == 1 ? edge.getSource() : edge.getTarget(); - K messageTarget = getSuperstepNumber() % 2 == 1 ? edge.getTarget() : edge.getSource(); - double messageValue = getSuperstepNumber() % 2 == 1 ? vertex.getValue().f0.getValue() : vertex.getValue().f1.getValue(); - - if (!messageTarget.equals(vertex.getId())) { - if (getSuperstepNumber() != maxIteration) { - sendMessageTo(messageTarget, messageValue / iterationValueSum); - - // in order to make every vertex updated - sendMessageTo(messageSource, 0.0); + if (getSuperstepNumber() != maxIteration) { + if (getSuperstepNumber() % 2 == 1) { + if (edge.getValue().equals("Authority")) { + sendMessageTo(edge.getTarget(), vertex.getValue().f0.getValue() / iterationValueSum); + } } else { - sendMessageTo(messageSource, iterationValueSum); + if (edge.getValue().equals("Hub")) { + sendMessageTo(edge.getTarget(), vertex.getValue().f1.getValue() / iterationValueSum); + } + } + + // make all the vertices be updated + sendMessageTo(edge.getSource(), 0.0); --- End diff -- Fixed with rebasing the previous commit. > Implementation of Gelly HITS Algorithm > -- > > Key: FLINK-2044 > URL: https://issues.apache.org/jira/browse/FLINK-2044 > Project: Flink > Issue Type: New Feature > Components: Gelly >Reporter: Ahamd Javid >Assignee: GaoLun >Priority: Minor > > Implementation of Hits Algorithm in Gelly API using Java. the feature branch > can be found here: (https://github.com/JavidMayar/flink/commits/HITS) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2044] [gelly] Implementation of Gelly H...
Github user gallenvara commented on a diff in the pull request: https://github.com/apache/flink/pull/1956#discussion_r64056167 --- Diff: flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/HITSAlgorithm.java --- @@ -243,18 +255,22 @@ public void sendMessages(Vertex> vertex) { iterationValueSum = Math.sqrt(((DoubleValue) getPreviousIterationAggregate("updatedValueSum")).getValue()); } for (Edge edge : getEdges()) { - K messageSource = getSuperstepNumber() % 2 == 1 ? edge.getSource() : edge.getTarget(); - K messageTarget = getSuperstepNumber() % 2 == 1 ? edge.getTarget() : edge.getSource(); - double messageValue = getSuperstepNumber() % 2 == 1 ? vertex.getValue().f0.getValue() : vertex.getValue().f1.getValue(); - - if (!messageTarget.equals(vertex.getId())) { - if (getSuperstepNumber() != maxIteration) { - sendMessageTo(messageTarget, messageValue / iterationValueSum); - - // in order to make every vertex updated - sendMessageTo(messageSource, 0.0); + if (getSuperstepNumber() != maxIteration) { + if (getSuperstepNumber() % 2 == 1) { + if (edge.getValue().equals("Authority")) { + sendMessageTo(edge.getTarget(), vertex.getValue().f0.getValue() / iterationValueSum); + } } else { - sendMessageTo(messageSource, iterationValueSum); + if (edge.getValue().equals("Hub")) { + sendMessageTo(edge.getTarget(), vertex.getValue().f1.getValue() / iterationValueSum); + } + } + + // make all the vertices be updated + sendMessageTo(edge.getSource(), 0.0); --- End diff -- Fixed with rebasing the previous commit. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2044) Implementation of Gelly HITS Algorithm
[ https://issues.apache.org/jira/browse/FLINK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15293499#comment-15293499 ] ASF GitHub Bot commented on FLINK-2044: --- Github user gallenvara commented on a diff in the pull request: https://github.com/apache/flink/pull/1956#discussion_r64056076 --- Diff: flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/HITSAlgorithm.java --- @@ -273,4 +289,23 @@ public void sendMessages(Vertex> vertex) { return initVertexValue; } } + + public static class AuthorityEdgeMapper implements MapFunction , String> { --- End diff -- Before committing the newest codes, i have test the edge with `String` label and `boolean` label. And i found that `boolean` label is not fast than the `String` label. So finally i used the "hub" & "authority", also it's very intuitionistic. > Implementation of Gelly HITS Algorithm > -- > > Key: FLINK-2044 > URL: https://issues.apache.org/jira/browse/FLINK-2044 > Project: Flink > Issue Type: New Feature > Components: Gelly >Reporter: Ahamd Javid >Assignee: GaoLun >Priority: Minor > > Implementation of Hits Algorithm in Gelly API using Java. the feature branch > can be found here: (https://github.com/JavidMayar/flink/commits/HITS) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2044) Implementation of Gelly HITS Algorithm
[ https://issues.apache.org/jira/browse/FLINK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15293488#comment-15293488 ] ASF GitHub Bot commented on FLINK-2044: --- Github user gallenvara commented on a diff in the pull request: https://github.com/apache/flink/pull/1956#discussion_r64055156 --- Diff: flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/HITSAlgorithm.java --- @@ -243,18 +255,22 @@ public void sendMessages(Vertex> vertex) { iterationValueSum = Math.sqrt(((DoubleValue) getPreviousIterationAggregate("updatedValueSum")).getValue()); } for (Edge edge : getEdges()) { - K messageSource = getSuperstepNumber() % 2 == 1 ? edge.getSource() : edge.getTarget(); - K messageTarget = getSuperstepNumber() % 2 == 1 ? edge.getTarget() : edge.getSource(); - double messageValue = getSuperstepNumber() % 2 == 1 ? vertex.getValue().f0.getValue() : vertex.getValue().f1.getValue(); - - if (!messageTarget.equals(vertex.getId())) { - if (getSuperstepNumber() != maxIteration) { - sendMessageTo(messageTarget, messageValue / iterationValueSum); - - // in order to make every vertex updated - sendMessageTo(messageSource, 0.0); + if (getSuperstepNumber() != maxIteration) { + if (getSuperstepNumber() % 2 == 1) { + if (edge.getValue().equals("Authority")) { + sendMessageTo(edge.getTarget(), vertex.getValue().f0.getValue() / iterationValueSum); + } } else { - sendMessageTo(messageSource, iterationValueSum); + if (edge.getValue().equals("Hub")) { + sendMessageTo(edge.getTarget(), vertex.getValue().f1.getValue() / iterationValueSum); + } + } + + // make all the vertices be updated + sendMessageTo(edge.getSource(), 0.0); --- End diff -- Sorry, this line should be moved. :) In the previous commit without edge label, i use the same edge for hub and authority updating and this line would keep every vertex updating. But in the newest implementation with edge label, we can drop this because the added edges can do the same work. > Implementation of Gelly HITS Algorithm > -- > > Key: FLINK-2044 > URL: https://issues.apache.org/jira/browse/FLINK-2044 > Project: Flink > Issue Type: New Feature > Components: Gelly >Reporter: Ahamd Javid >Assignee: GaoLun >Priority: Minor > > Implementation of Hits Algorithm in Gelly API using Java. the feature branch > can be found here: (https://github.com/JavidMayar/flink/commits/HITS) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2044] [gelly] Implementation of Gelly H...
Github user gallenvara commented on a diff in the pull request: https://github.com/apache/flink/pull/1956#discussion_r64055156 --- Diff: flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/HITSAlgorithm.java --- @@ -243,18 +255,22 @@ public void sendMessages(Vertex> vertex) { iterationValueSum = Math.sqrt(((DoubleValue) getPreviousIterationAggregate("updatedValueSum")).getValue()); } for (Edge edge : getEdges()) { - K messageSource = getSuperstepNumber() % 2 == 1 ? edge.getSource() : edge.getTarget(); - K messageTarget = getSuperstepNumber() % 2 == 1 ? edge.getTarget() : edge.getSource(); - double messageValue = getSuperstepNumber() % 2 == 1 ? vertex.getValue().f0.getValue() : vertex.getValue().f1.getValue(); - - if (!messageTarget.equals(vertex.getId())) { - if (getSuperstepNumber() != maxIteration) { - sendMessageTo(messageTarget, messageValue / iterationValueSum); - - // in order to make every vertex updated - sendMessageTo(messageSource, 0.0); + if (getSuperstepNumber() != maxIteration) { + if (getSuperstepNumber() % 2 == 1) { + if (edge.getValue().equals("Authority")) { + sendMessageTo(edge.getTarget(), vertex.getValue().f0.getValue() / iterationValueSum); + } } else { - sendMessageTo(messageSource, iterationValueSum); + if (edge.getValue().equals("Hub")) { + sendMessageTo(edge.getTarget(), vertex.getValue().f1.getValue() / iterationValueSum); + } + } + + // make all the vertices be updated + sendMessageTo(edge.getSource(), 0.0); --- End diff -- Sorry, this line should be moved. :) In the previous commit without edge label, i use the same edge for hub and authority updating and this line would keep every vertex updating. But in the newest implementation with edge label, we can drop this because the added edges can do the same work. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2044) Implementation of Gelly HITS Algorithm
[ https://issues.apache.org/jira/browse/FLINK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15293473#comment-15293473 ] ASF GitHub Bot commented on FLINK-2044: --- Github user vasia commented on the pull request: https://github.com/apache/flink/pull/1956#issuecomment-220628229 Thank you for the update @gallenvara and for your patience with our continuous comments :) How many edges did the graphs in your experiment have? > Implementation of Gelly HITS Algorithm > -- > > Key: FLINK-2044 > URL: https://issues.apache.org/jira/browse/FLINK-2044 > Project: Flink > Issue Type: New Feature > Components: Gelly >Reporter: Ahamd Javid >Assignee: GaoLun >Priority: Minor > > Implementation of Hits Algorithm in Gelly API using Java. the feature branch > can be found here: (https://github.com/JavidMayar/flink/commits/HITS) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2044] [gelly] Implementation of Gelly H...
Github user vasia commented on the pull request: https://github.com/apache/flink/pull/1956#issuecomment-220628229 Thank you for the update @gallenvara and for your patience with our continuous comments :) How many edges did the graphs in your experiment have? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2044) Implementation of Gelly HITS Algorithm
[ https://issues.apache.org/jira/browse/FLINK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15293470#comment-15293470 ] ASF GitHub Bot commented on FLINK-2044: --- Github user vasia commented on a diff in the pull request: https://github.com/apache/flink/pull/1956#discussion_r64054001 --- Diff: flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/HITSAlgorithm.java --- @@ -273,4 +289,23 @@ public void sendMessages(Vertex> vertex) { return initVertexValue; } } + + public static class AuthorityEdgeMapper implements MapFunction , String> { --- End diff -- Ah when I proposed to label the edges as "authority" or "hub" I didn't really mean to add a `String` label :) We can do this with a boolean. @greghogan what do you think? > Implementation of Gelly HITS Algorithm > -- > > Key: FLINK-2044 > URL: https://issues.apache.org/jira/browse/FLINK-2044 > Project: Flink > Issue Type: New Feature > Components: Gelly >Reporter: Ahamd Javid >Assignee: GaoLun >Priority: Minor > > Implementation of Hits Algorithm in Gelly API using Java. the feature branch > can be found here: (https://github.com/JavidMayar/flink/commits/HITS) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2044] [gelly] Implementation of Gelly H...
Github user vasia commented on a diff in the pull request: https://github.com/apache/flink/pull/1956#discussion_r64054001 --- Diff: flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/HITSAlgorithm.java --- @@ -273,4 +289,23 @@ public void sendMessages(Vertex> vertex) { return initVertexValue; } } + + public static class AuthorityEdgeMapper implements MapFunction , String> { --- End diff -- Ah when I proposed to label the edges as "authority" or "hub" I didn't really mean to add a `String` label :) We can do this with a boolean. @greghogan what do you think? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2044) Implementation of Gelly HITS Algorithm
[ https://issues.apache.org/jira/browse/FLINK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15293452#comment-15293452 ] ASF GitHub Bot commented on FLINK-2044: --- Github user vasia commented on a diff in the pull request: https://github.com/apache/flink/pull/1956#discussion_r64051693 --- Diff: flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/HITSAlgorithm.java --- @@ -243,18 +255,22 @@ public void sendMessages(Vertex> vertex) { iterationValueSum = Math.sqrt(((DoubleValue) getPreviousIterationAggregate("updatedValueSum")).getValue()); } for (Edge edge : getEdges()) { - K messageSource = getSuperstepNumber() % 2 == 1 ? edge.getSource() : edge.getTarget(); - K messageTarget = getSuperstepNumber() % 2 == 1 ? edge.getTarget() : edge.getSource(); - double messageValue = getSuperstepNumber() % 2 == 1 ? vertex.getValue().f0.getValue() : vertex.getValue().f1.getValue(); - - if (!messageTarget.equals(vertex.getId())) { - if (getSuperstepNumber() != maxIteration) { - sendMessageTo(messageTarget, messageValue / iterationValueSum); - - // in order to make every vertex updated - sendMessageTo(messageSource, 0.0); + if (getSuperstepNumber() != maxIteration) { + if (getSuperstepNumber() % 2 == 1) { + if (edge.getValue().equals("Authority")) { + sendMessageTo(edge.getTarget(), vertex.getValue().f0.getValue() / iterationValueSum); + } } else { - sendMessageTo(messageSource, iterationValueSum); + if (edge.getValue().equals("Hub")) { + sendMessageTo(edge.getTarget(), vertex.getValue().f1.getValue() / iterationValueSum); + } + } + + // make all the vertices be updated + sendMessageTo(edge.getSource(), 0.0); --- End diff -- Why do we need to send this message? > Implementation of Gelly HITS Algorithm > -- > > Key: FLINK-2044 > URL: https://issues.apache.org/jira/browse/FLINK-2044 > Project: Flink > Issue Type: New Feature > Components: Gelly >Reporter: Ahamd Javid >Assignee: GaoLun >Priority: Minor > > Implementation of Hits Algorithm in Gelly API using Java. the feature branch > can be found here: (https://github.com/JavidMayar/flink/commits/HITS) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2044] [gelly] Implementation of Gelly H...
Github user vasia commented on a diff in the pull request: https://github.com/apache/flink/pull/1956#discussion_r64051693 --- Diff: flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/HITSAlgorithm.java --- @@ -243,18 +255,22 @@ public void sendMessages(Vertex> vertex) { iterationValueSum = Math.sqrt(((DoubleValue) getPreviousIterationAggregate("updatedValueSum")).getValue()); } for (Edge edge : getEdges()) { - K messageSource = getSuperstepNumber() % 2 == 1 ? edge.getSource() : edge.getTarget(); - K messageTarget = getSuperstepNumber() % 2 == 1 ? edge.getTarget() : edge.getSource(); - double messageValue = getSuperstepNumber() % 2 == 1 ? vertex.getValue().f0.getValue() : vertex.getValue().f1.getValue(); - - if (!messageTarget.equals(vertex.getId())) { - if (getSuperstepNumber() != maxIteration) { - sendMessageTo(messageTarget, messageValue / iterationValueSum); - - // in order to make every vertex updated - sendMessageTo(messageSource, 0.0); + if (getSuperstepNumber() != maxIteration) { + if (getSuperstepNumber() % 2 == 1) { + if (edge.getValue().equals("Authority")) { + sendMessageTo(edge.getTarget(), vertex.getValue().f0.getValue() / iterationValueSum); + } } else { - sendMessageTo(messageSource, iterationValueSum); + if (edge.getValue().equals("Hub")) { + sendMessageTo(edge.getTarget(), vertex.getValue().f1.getValue() / iterationValueSum); + } + } + + // make all the vertices be updated + sendMessageTo(edge.getSource(), 0.0); --- End diff -- Why do we need to send this message? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-3780) Jaccard Similarity
[ https://issues.apache.org/jira/browse/FLINK-3780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15293439#comment-15293439 ] ASF GitHub Bot commented on FLINK-3780: --- Github user vasia commented on the pull request: https://github.com/apache/flink/pull/1980#issuecomment-220620841 Thanks for the update @greghogan. I left a few comments for improvements of the docs. After these are addressed it should be good to merge. > Jaccard Similarity > -- > > Key: FLINK-3780 > URL: https://issues.apache.org/jira/browse/FLINK-3780 > Project: Flink > Issue Type: New Feature > Components: Gelly >Affects Versions: 1.1.0 >Reporter: Greg Hogan >Assignee: Greg Hogan > Fix For: 1.1.0 > > > Implement a Jaccard Similarity algorithm computing all non-zero similarity > scores. This algorithm is similar to {{TriangleListing}} but instead of > joining two-paths against an edge list we count two-paths. > {{flink-gelly-examples}} currently has {{JaccardSimilarityMeasure}} which > relies on {{Graph.getTriplets()}} so only computes similarity scores for > neighbors but not neighbors-of-neighbors. > This algorithm is easily modified for other similarity scores such as > Adamic-Adar similarity where the sum of endpoint degrees is replaced by the > degree of the middle vertex. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-3780] [gelly] Jaccard Similarity
Github user vasia commented on a diff in the pull request: https://github.com/apache/flink/pull/1980#discussion_r64049343 --- Diff: docs/apis/batch/libs/gelly.md --- @@ -2055,22 +2055,22 @@ vertex and edge in the output graph stores the common group value and the number ### Jaccard Index Overview -The Jaccard Index measures the similarity between vertex neighborhoods. Scores range from 0.0 (no common neighbors) to -1.0 (all neighbors are common). +The Jaccard Index measures the similarity between vertex neighborhoods and is computed as the number of shared numbers +divided by the number of distinct neighbors. Scores range from 0.0 (no shared neighbors) to 1.0 (all neighbors are +shared). Details -Counting common neighbors for pairs of vertices is equivalent to counting the two-paths consisting of two edges -connecting the two vertices to the common neighbor. The number of distinct neighbors for pairs of vertices is computed -by storing the sum of degrees of the vertex pair and subtracting the count of common neighbors, which are double-counted -in the sum of degrees. +Counting shared neighbors for pairs of vertices is equivalent to counting connecting paths of length two. The number of +distinct neighbors is computed by storing the sum of degrees of the vertex pair and subtracting the count of shared +neighbors, which are double-counted in the sum of degrees. -The algorithm first annotates each edge with the endpoint degree. Grouping on the midpoint vertex, each pair of -neighbors is emitted with the endpoint degree sum. Grouping on two-paths, the common neighbors are counted. +The algorithm first annotates each edge with the target vertex's degree. Grouping on the source vertex, each pair of +neighbors is emitted with the degree sum. Grouping on two-paths, the shared neighbors are counted. Usage The algorithm takes a simple, undirected graph as input and outputs a `DataSet` of tuples containing two vertex IDs, -the number of common neighbors, and the number of distinct neighbors. The graph ID type must be `Comparable` and -`Copyable`. +the number of shared neighbors, and the number of distinct neighbors. The result class provides a method to compute the +Jaccard Index score. The graph ID type must be `Comparable` and `Copyable`. --- End diff -- Here we should also document what is the output of the algorithm, i.e. the `Result` type and how to get the jaccard similarity out of it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-3780] [gelly] Jaccard Similarity
Github user vasia commented on the pull request: https://github.com/apache/flink/pull/1980#issuecomment-220620841 Thanks for the update @greghogan. I left a few comments for improvements of the docs. After these are addressed it should be good to merge. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-3780) Jaccard Similarity
[ https://issues.apache.org/jira/browse/FLINK-3780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15293438#comment-15293438 ] ASF GitHub Bot commented on FLINK-3780: --- Github user vasia commented on a diff in the pull request: https://github.com/apache/flink/pull/1980#discussion_r64049343 --- Diff: docs/apis/batch/libs/gelly.md --- @@ -2055,22 +2055,22 @@ vertex and edge in the output graph stores the common group value and the number ### Jaccard Index Overview -The Jaccard Index measures the similarity between vertex neighborhoods. Scores range from 0.0 (no common neighbors) to -1.0 (all neighbors are common). +The Jaccard Index measures the similarity between vertex neighborhoods and is computed as the number of shared numbers +divided by the number of distinct neighbors. Scores range from 0.0 (no shared neighbors) to 1.0 (all neighbors are +shared). Details -Counting common neighbors for pairs of vertices is equivalent to counting the two-paths consisting of two edges -connecting the two vertices to the common neighbor. The number of distinct neighbors for pairs of vertices is computed -by storing the sum of degrees of the vertex pair and subtracting the count of common neighbors, which are double-counted -in the sum of degrees. +Counting shared neighbors for pairs of vertices is equivalent to counting connecting paths of length two. The number of +distinct neighbors is computed by storing the sum of degrees of the vertex pair and subtracting the count of shared +neighbors, which are double-counted in the sum of degrees. -The algorithm first annotates each edge with the endpoint degree. Grouping on the midpoint vertex, each pair of -neighbors is emitted with the endpoint degree sum. Grouping on two-paths, the common neighbors are counted. +The algorithm first annotates each edge with the target vertex's degree. Grouping on the source vertex, each pair of +neighbors is emitted with the degree sum. Grouping on two-paths, the shared neighbors are counted. Usage The algorithm takes a simple, undirected graph as input and outputs a `DataSet` of tuples containing two vertex IDs, -the number of common neighbors, and the number of distinct neighbors. The graph ID type must be `Comparable` and -`Copyable`. +the number of shared neighbors, and the number of distinct neighbors. The result class provides a method to compute the +Jaccard Index score. The graph ID type must be `Comparable` and `Copyable`. --- End diff -- Here we should also document what is the output of the algorithm, i.e. the `Result` type and how to get the jaccard similarity out of it. > Jaccard Similarity > -- > > Key: FLINK-3780 > URL: https://issues.apache.org/jira/browse/FLINK-3780 > Project: Flink > Issue Type: New Feature > Components: Gelly >Affects Versions: 1.1.0 >Reporter: Greg Hogan >Assignee: Greg Hogan > Fix For: 1.1.0 > > > Implement a Jaccard Similarity algorithm computing all non-zero similarity > scores. This algorithm is similar to {{TriangleListing}} but instead of > joining two-paths against an edge list we count two-paths. > {{flink-gelly-examples}} currently has {{JaccardSimilarityMeasure}} which > relies on {{Graph.getTriplets()}} so only computes similarity scores for > neighbors but not neighbors-of-neighbors. > This algorithm is easily modified for other similarity scores such as > Adamic-Adar similarity where the sum of endpoint degrees is replaced by the > degree of the middle vertex. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-3780) Jaccard Similarity
[ https://issues.apache.org/jira/browse/FLINK-3780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15293432#comment-15293432 ] ASF GitHub Bot commented on FLINK-3780: --- Github user vasia commented on a diff in the pull request: https://github.com/apache/flink/pull/1980#discussion_r64048955 --- Diff: flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/similarity/JaccardIndex.java --- @@ -43,11 +43,13 @@ import java.util.List; /** - * The Jaccard Index measures the similarity between vertex neighborhoods. - * Scores range from 0.0 (no common neighbors) to 1.0 (all neighbors are common). + * The Jaccard Index measures the similarity between vertex neighborhoods and + * is computed as the number of shared numbers divided by the number of --- End diff -- numbers -> neighbors > Jaccard Similarity > -- > > Key: FLINK-3780 > URL: https://issues.apache.org/jira/browse/FLINK-3780 > Project: Flink > Issue Type: New Feature > Components: Gelly >Affects Versions: 1.1.0 >Reporter: Greg Hogan >Assignee: Greg Hogan > Fix For: 1.1.0 > > > Implement a Jaccard Similarity algorithm computing all non-zero similarity > scores. This algorithm is similar to {{TriangleListing}} but instead of > joining two-paths against an edge list we count two-paths. > {{flink-gelly-examples}} currently has {{JaccardSimilarityMeasure}} which > relies on {{Graph.getTriplets()}} so only computes similarity scores for > neighbors but not neighbors-of-neighbors. > This algorithm is easily modified for other similarity scores such as > Adamic-Adar similarity where the sum of endpoint degrees is replaced by the > degree of the middle vertex. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-3780] [gelly] Jaccard Similarity
Github user vasia commented on a diff in the pull request: https://github.com/apache/flink/pull/1980#discussion_r64048955 --- Diff: flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/similarity/JaccardIndex.java --- @@ -43,11 +43,13 @@ import java.util.List; /** - * The Jaccard Index measures the similarity between vertex neighborhoods. - * Scores range from 0.0 (no common neighbors) to 1.0 (all neighbors are common). + * The Jaccard Index measures the similarity between vertex neighborhoods and + * is computed as the number of shared numbers divided by the number of --- End diff -- numbers -> neighbors --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-3780) Jaccard Similarity
[ https://issues.apache.org/jira/browse/FLINK-3780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15293431#comment-15293431 ] ASF GitHub Bot commented on FLINK-3780: --- Github user vasia commented on a diff in the pull request: https://github.com/apache/flink/pull/1980#discussion_r64048499 --- Diff: flink-libraries/flink-gelly-examples/src/main/java/org/apache/flink/graph/examples/JaccardIndex.java --- @@ -54,73 +57,112 @@ public static final boolean DEFAULT_CLIP_AND_FLIP = true; + private static void printUsage() { + System.out.println(WordUtils.wrap("The Jaccard Index measures the similarity between vertex" + + " neighborhoods and is computed as the number of shared numbers divided by the number of" + --- End diff -- numbers -> neighbors > Jaccard Similarity > -- > > Key: FLINK-3780 > URL: https://issues.apache.org/jira/browse/FLINK-3780 > Project: Flink > Issue Type: New Feature > Components: Gelly >Affects Versions: 1.1.0 >Reporter: Greg Hogan >Assignee: Greg Hogan > Fix For: 1.1.0 > > > Implement a Jaccard Similarity algorithm computing all non-zero similarity > scores. This algorithm is similar to {{TriangleListing}} but instead of > joining two-paths against an edge list we count two-paths. > {{flink-gelly-examples}} currently has {{JaccardSimilarityMeasure}} which > relies on {{Graph.getTriplets()}} so only computes similarity scores for > neighbors but not neighbors-of-neighbors. > This algorithm is easily modified for other similarity scores such as > Adamic-Adar similarity where the sum of endpoint degrees is replaced by the > degree of the middle vertex. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-3780] [gelly] Jaccard Similarity
Github user vasia commented on a diff in the pull request: https://github.com/apache/flink/pull/1980#discussion_r64048499 --- Diff: flink-libraries/flink-gelly-examples/src/main/java/org/apache/flink/graph/examples/JaccardIndex.java --- @@ -54,73 +57,112 @@ public static final boolean DEFAULT_CLIP_AND_FLIP = true; + private static void printUsage() { + System.out.println(WordUtils.wrap("The Jaccard Index measures the similarity between vertex" + + " neighborhoods and is computed as the number of shared numbers divided by the number of" + --- End diff -- numbers -> neighbors --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-3780) Jaccard Similarity
[ https://issues.apache.org/jira/browse/FLINK-3780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15293430#comment-15293430 ] ASF GitHub Bot commented on FLINK-3780: --- Github user vasia commented on a diff in the pull request: https://github.com/apache/flink/pull/1980#discussion_r64048412 --- Diff: flink-libraries/flink-gelly-examples/src/main/java/org/apache/flink/graph/examples/JaccardIndex.java --- @@ -40,9 +43,9 @@ /** * Driver for the library implementation of Jaccard Index. * - * This example generates an undirected RMat graph with the given scale and - * edge factor then calculates all non-zero Jaccard Index similarity scores - * between vertices. + * This example reads a simple, undirected graph from a CSV file or generates --- End diff -- remove one "generates" > Jaccard Similarity > -- > > Key: FLINK-3780 > URL: https://issues.apache.org/jira/browse/FLINK-3780 > Project: Flink > Issue Type: New Feature > Components: Gelly >Affects Versions: 1.1.0 >Reporter: Greg Hogan >Assignee: Greg Hogan > Fix For: 1.1.0 > > > Implement a Jaccard Similarity algorithm computing all non-zero similarity > scores. This algorithm is similar to {{TriangleListing}} but instead of > joining two-paths against an edge list we count two-paths. > {{flink-gelly-examples}} currently has {{JaccardSimilarityMeasure}} which > relies on {{Graph.getTriplets()}} so only computes similarity scores for > neighbors but not neighbors-of-neighbors. > This algorithm is easily modified for other similarity scores such as > Adamic-Adar similarity where the sum of endpoint degrees is replaced by the > degree of the middle vertex. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-3780) Jaccard Similarity
[ https://issues.apache.org/jira/browse/FLINK-3780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15293429#comment-15293429 ] ASF GitHub Bot commented on FLINK-3780: --- Github user vasia commented on a diff in the pull request: https://github.com/apache/flink/pull/1980#discussion_r64048334 --- Diff: docs/apis/batch/libs/gelly.md --- @@ -2250,14 +2250,33 @@ graph.run(new TranslateVertexValues(new LongValueAddOffset(vertexCount))); - translate.TranslateEdgeValues + asm.translate.TranslateEdgeValues Translate edge values using the given TranslateFunction. {% highlight java %} graph.run(new TranslateEdgeValues(new Nullify())); {% endhighlight %} + + + library.similarity.JaccardIndex + +Measures the similarity between vertex neighborhoods. The Jaccard Index score is computed as the number of shared numbers divided by the number of distinct neighbors. Scores range from 0.0 (no shared neighbors) to 1.0 (all neighbors are shared). --- End diff -- Why did you add this here and not in the "Usage" section of the library method? I find it a bit confusing... You describe graph algorithms as building blocks for other algorithms. Does Jaccard index fall in this category? > Jaccard Similarity > -- > > Key: FLINK-3780 > URL: https://issues.apache.org/jira/browse/FLINK-3780 > Project: Flink > Issue Type: New Feature > Components: Gelly >Affects Versions: 1.1.0 >Reporter: Greg Hogan >Assignee: Greg Hogan > Fix For: 1.1.0 > > > Implement a Jaccard Similarity algorithm computing all non-zero similarity > scores. This algorithm is similar to {{TriangleListing}} but instead of > joining two-paths against an edge list we count two-paths. > {{flink-gelly-examples}} currently has {{JaccardSimilarityMeasure}} which > relies on {{Graph.getTriplets()}} so only computes similarity scores for > neighbors but not neighbors-of-neighbors. > This algorithm is easily modified for other similarity scores such as > Adamic-Adar similarity where the sum of endpoint degrees is replaced by the > degree of the middle vertex. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-3780] [gelly] Jaccard Similarity
Github user vasia commented on a diff in the pull request: https://github.com/apache/flink/pull/1980#discussion_r64048412 --- Diff: flink-libraries/flink-gelly-examples/src/main/java/org/apache/flink/graph/examples/JaccardIndex.java --- @@ -40,9 +43,9 @@ /** * Driver for the library implementation of Jaccard Index. * - * This example generates an undirected RMat graph with the given scale and - * edge factor then calculates all non-zero Jaccard Index similarity scores - * between vertices. + * This example reads a simple, undirected graph from a CSV file or generates --- End diff -- remove one "generates" --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-3780] [gelly] Jaccard Similarity
Github user vasia commented on a diff in the pull request: https://github.com/apache/flink/pull/1980#discussion_r64048334 --- Diff: docs/apis/batch/libs/gelly.md --- @@ -2250,14 +2250,33 @@ graph.run(new TranslateVertexValues(new LongValueAddOffset(vertexCount))); - translate.TranslateEdgeValues + asm.translate.TranslateEdgeValues Translate edge values using the given TranslateFunction. {% highlight java %} graph.run(new TranslateEdgeValues(new Nullify())); {% endhighlight %} + + + library.similarity.JaccardIndex + +Measures the similarity between vertex neighborhoods. The Jaccard Index score is computed as the number of shared numbers divided by the number of distinct neighbors. Scores range from 0.0 (no shared neighbors) to 1.0 (all neighbors are shared). --- End diff -- Why did you add this here and not in the "Usage" section of the library method? I find it a bit confusing... You describe graph algorithms as building blocks for other algorithms. Does Jaccard index fall in this category? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-3780) Jaccard Similarity
[ https://issues.apache.org/jira/browse/FLINK-3780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15293424#comment-15293424 ] ASF GitHub Bot commented on FLINK-3780: --- Github user vasia commented on a diff in the pull request: https://github.com/apache/flink/pull/1980#discussion_r64047578 --- Diff: docs/apis/batch/libs/gelly.md --- @@ -2055,22 +2055,22 @@ vertex and edge in the output graph stores the common group value and the number ### Jaccard Index Overview -The Jaccard Index measures the similarity between vertex neighborhoods. Scores range from 0.0 (no common neighbors) to -1.0 (all neighbors are common). +The Jaccard Index measures the similarity between vertex neighborhoods and is computed as the number of shared numbers --- End diff -- shared numbers -> shared *neighbors? > Jaccard Similarity > -- > > Key: FLINK-3780 > URL: https://issues.apache.org/jira/browse/FLINK-3780 > Project: Flink > Issue Type: New Feature > Components: Gelly >Affects Versions: 1.1.0 >Reporter: Greg Hogan >Assignee: Greg Hogan > Fix For: 1.1.0 > > > Implement a Jaccard Similarity algorithm computing all non-zero similarity > scores. This algorithm is similar to {{TriangleListing}} but instead of > joining two-paths against an edge list we count two-paths. > {{flink-gelly-examples}} currently has {{JaccardSimilarityMeasure}} which > relies on {{Graph.getTriplets()}} so only computes similarity scores for > neighbors but not neighbors-of-neighbors. > This algorithm is easily modified for other similarity scores such as > Adamic-Adar similarity where the sum of endpoint degrees is replaced by the > degree of the middle vertex. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-3780] [gelly] Jaccard Similarity
Github user greghogan commented on the pull request: https://github.com/apache/flink/pull/1980#issuecomment-220600881 @vasia all updates should be in place. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-3780) Jaccard Similarity
[ https://issues.apache.org/jira/browse/FLINK-3780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15293324#comment-15293324 ] ASF GitHub Bot commented on FLINK-3780: --- Github user greghogan commented on a diff in the pull request: https://github.com/apache/flink/pull/1980#discussion_r64037567 --- Diff: flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/asm/degree/annotate/TranslateEdgeDegreeToIntValue.java --- @@ -0,0 +1,51 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.graph.asm.degree.annotate; + +import org.apache.flink.api.java.tuple.Tuple2; +import org.apache.flink.graph.asm.translate.TranslateFunction; +import org.apache.flink.types.IntValue; +import org.apache.flink.types.LongValue; + +/** + * Translate the edge degree returned by the degree annotation functions from + * {@link LongValue} to {@link IntValue}. + * + * @param edge value type + */ +public class TranslateEdgeDegreeToIntValue --- End diff -- I thought on this for a while, and decided that the best option is to subsume the translation in the next operation, and discovered that I had already made that change. So the class just needs to be removed. > Jaccard Similarity > -- > > Key: FLINK-3780 > URL: https://issues.apache.org/jira/browse/FLINK-3780 > Project: Flink > Issue Type: New Feature > Components: Gelly >Affects Versions: 1.1.0 >Reporter: Greg Hogan >Assignee: Greg Hogan > Fix For: 1.1.0 > > > Implement a Jaccard Similarity algorithm computing all non-zero similarity > scores. This algorithm is similar to {{TriangleListing}} but instead of > joining two-paths against an edge list we count two-paths. > {{flink-gelly-examples}} currently has {{JaccardSimilarityMeasure}} which > relies on {{Graph.getTriplets()}} so only computes similarity scores for > neighbors but not neighbors-of-neighbors. > This algorithm is easily modified for other similarity scores such as > Adamic-Adar similarity where the sum of endpoint degrees is replaced by the > degree of the middle vertex. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-3780] [gelly] Jaccard Similarity
Github user greghogan commented on a diff in the pull request: https://github.com/apache/flink/pull/1980#discussion_r64037567 --- Diff: flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/asm/degree/annotate/TranslateEdgeDegreeToIntValue.java --- @@ -0,0 +1,51 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.graph.asm.degree.annotate; + +import org.apache.flink.api.java.tuple.Tuple2; +import org.apache.flink.graph.asm.translate.TranslateFunction; +import org.apache.flink.types.IntValue; +import org.apache.flink.types.LongValue; + +/** + * Translate the edge degree returned by the degree annotation functions from + * {@link LongValue} to {@link IntValue}. + * + * @param edge value type + */ +public class TranslateEdgeDegreeToIntValue --- End diff -- I thought on this for a while, and decided that the best option is to subsume the translation in the next operation, and discovered that I had already made that change. So the class just needs to be removed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Created] (FLINK-3944) Add optimization rules to reorder Cartesian products and joins
Fabian Hueske created FLINK-3944: Summary: Add optimization rules to reorder Cartesian products and joins Key: FLINK-3944 URL: https://issues.apache.org/jira/browse/FLINK-3944 Project: Flink Issue Type: Bug Components: Table API Affects Versions: 1.1.0 Reporter: Fabian Hueske Priority: Critical Fix For: 1.1.0 Currently, we do not support the execution of Cartesian products. Because we do not optimize the order of joins (due to missing statistics), joins are executed in the order in which they are specified. This works well for the Table API, however it can be problematic in case of SQL queries where the order of tables in the FROM clause should not matter. In case of SQL queries, it can happen that the optimized plan contains Cartesian products because joins are not reordered. If we add optimization rules that switch Cartesian products and joins, such situations can be resolved. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-3943) Add support for EXCEPT (set minus)
Fabian Hueske created FLINK-3943: Summary: Add support for EXCEPT (set minus) Key: FLINK-3943 URL: https://issues.apache.org/jira/browse/FLINK-3943 Project: Flink Issue Type: New Feature Components: Table API Affects Versions: 1.1.0 Reporter: Fabian Hueske Priority: Minor Currently, the Table API and SQL do not support EXCEPT. EXCEPT can be executed as a coGroup on all fields that forwards records of the first input if the second input is empty. In order to add support for EXCEPT to the Table API and SQL we need to: - Implement a {{DataSetMinus}} class that translates an EXCEPT into a DataSet API program using a coGroup on all fields. - Implement a {{DataSetMinusRule}} that translates a Calcite {{LogicalMinus}} into a {{DataSetMinus}}. - Extend the Table API (and validation phase) to provide an except() method. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-3942) Add support for INTERSECT
Fabian Hueske created FLINK-3942: Summary: Add support for INTERSECT Key: FLINK-3942 URL: https://issues.apache.org/jira/browse/FLINK-3942 Project: Flink Issue Type: New Feature Components: Table API Affects Versions: 1.1.0 Reporter: Fabian Hueske Priority: Minor Currently, the Table API and SQL do not support INTERSECT. INTERSECT can be executed as join on all fields. In order to add support for INTERSECT to the Table API and SQL we need to: - Implement a {{DataSetIntersect}} class that translates an INTERSECT into a DataSet API program using a join on all fields. - Implement a {{DataSetIntersectRule}} that translates a Calcite {{LogicalIntersect}} into a {{DataSetIntersect}}. - Extend the Table API (and validation phase) to provide an intersect() method. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-3941) Add support for UNION (with duplicate elimination)
Fabian Hueske created FLINK-3941: Summary: Add support for UNION (with duplicate elimination) Key: FLINK-3941 URL: https://issues.apache.org/jira/browse/FLINK-3941 Project: Flink Issue Type: New Feature Components: Table API Affects Versions: 1.1.0 Reporter: Fabian Hueske Priority: Minor Currently, only UNION ALL is supported by Table API and SQL. UNION (with duplicate elimination) can be supported by applying a {{DataSet.distinct()}} after the union on all fields. This issue includes: - Extending {{DataSetUnion}} - Relaxing {{DataSetUnionRule}} to translated non-all unions. - Extend the Table API with union() method. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-3940) Add support for ORDER BY OFFSET FETCH
[ https://issues.apache.org/jira/browse/FLINK-3940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fabian Hueske updated FLINK-3940: - Priority: Minor (was: Major) > Add support for ORDER BY OFFSET FETCH > - > > Key: FLINK-3940 > URL: https://issues.apache.org/jira/browse/FLINK-3940 > Project: Flink > Issue Type: New Feature > Components: Table API >Affects Versions: 1.1.0 >Reporter: Fabian Hueske >Priority: Minor > > Currently only ORDER BY without OFFSET and FETCH are supported. > This issue tracks the effort to add support for OFFSET and FETCH and involves: > - Implementing the execution strategy in `DataSetSort` > - adapting the `DataSetSortRule` to support OFFSET and FETCH > - extending the Table API and validation to support OFFSET and FETCH and > generate a corresponding RelNode. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-3940) Add support for ORDER BY OFFSET FETCH
[ https://issues.apache.org/jira/browse/FLINK-3940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fabian Hueske updated FLINK-3940: - Issue Type: New Feature (was: Bug) > Add support for ORDER BY OFFSET FETCH > - > > Key: FLINK-3940 > URL: https://issues.apache.org/jira/browse/FLINK-3940 > Project: Flink > Issue Type: New Feature > Components: Table API >Affects Versions: 1.1.0 >Reporter: Fabian Hueske > > Currently only ORDER BY without OFFSET and FETCH are supported. > This issue tracks the effort to add support for OFFSET and FETCH and involves: > - Implementing the execution strategy in `DataSetSort` > - adapting the `DataSetSortRule` to support OFFSET and FETCH > - extending the Table API and validation to support OFFSET and FETCH and > generate a corresponding RelNode. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-3940) Add support for ORDER BY OFFSET FETCH
Fabian Hueske created FLINK-3940: Summary: Add support for ORDER BY OFFSET FETCH Key: FLINK-3940 URL: https://issues.apache.org/jira/browse/FLINK-3940 Project: Flink Issue Type: Bug Components: Table API Affects Versions: 1.1.0 Reporter: Fabian Hueske Currently only ORDER BY without OFFSET and FETCH are supported. This issue tracks the effort to add support for OFFSET and FETCH and involves: - Implementing the execution strategy in `DataSetSort` - adapting the `DataSetSortRule` to support OFFSET and FETCH - extending the Table API and validation to support OFFSET and FETCH and generate a corresponding RelNode. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-3475) DISTINCT aggregate function support
[ https://issues.apache.org/jira/browse/FLINK-3475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15293052#comment-15293052 ] Fabian Hueske commented on FLINK-3475: -- DISTINCT aggregates can be computed by sorting the reduce group on the distinct attribute (secondary sort) and not considering duplicate values. A first step would be to add support for a single distinct attribute (groups can only be primarily sorted on one attribute). In case of multiple distinct aggregates, we have to split the aggregation into several group reduce operators and join the result afterwards. The join can be done locally and in a streamed merge join (partitioning and sorting will be preserved). > DISTINCT aggregate function support > --- > > Key: FLINK-3475 > URL: https://issues.apache.org/jira/browse/FLINK-3475 > Project: Flink > Issue Type: Sub-task > Components: Table API >Reporter: Chengxiang Li >Assignee: Chengxiang Li > > DISTINCT aggregate function may be able to reuse the aggregate function > instead of separate implementation, and let Flink runtime take care of > duplicate records. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-3939) Prevent distinct aggregates and grouping sets from being translated
[ https://issues.apache.org/jira/browse/FLINK-3939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15293029#comment-15293029 ] ASF GitHub Bot commented on FLINK-3939: --- GitHub user fhueske opened a pull request: https://github.com/apache/flink/pull/2014 [FLINK-3939] [tableAPI] Prevent translation of unsupported distinct aggregates and grouping sets. PR fixes `DataSetAggregateRule` to not translate `LogicalAggregate` operators that include distinct aggregates or grouping sets. Such aggregations are currently not supported by `DataSetAggregate`. - [X] General - The pull request references the related JIRA issue ("[FLINK-XXX] Jira title text") - The pull request addresses only one issue - Each commit in the PR has a meaningful commit message (including the JIRA id) - [X] Documentation - Bug fix does not require docs - [X] Tests & Build - Functionality added by the pull request is covered by tests - `mvn clean verify` has been executed successfully locally or a Travis build has passed You can merge this pull request into a Git repository by running: $ git pull https://github.com/fhueske/flink tableDistAggs Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/2014.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2014 commit ed9d68dece81346978564bf4185e97f7a2cf6e74 Author: Fabian HueskeDate: 2016-05-19T22:14:00Z [FLINK-3939] [tableAPI] Prevent translation of unsupported distinct aggregates and grouping sets. > Prevent distinct aggregates and grouping sets from being translated > --- > > Key: FLINK-3939 > URL: https://issues.apache.org/jira/browse/FLINK-3939 > Project: Flink > Issue Type: Bug > Components: Table API >Reporter: Fabian Hueske >Assignee: Fabian Hueske > Fix For: 1.1.0 > > > Flink's SQL interface is currently not capable of executing distinct > aggregates and grouping sets. > We need to prevent that queries with these operations are translated by > adapting the DataSetAggregateRule. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-3939] [tableAPI] Prevent translation of...
GitHub user fhueske opened a pull request: https://github.com/apache/flink/pull/2014 [FLINK-3939] [tableAPI] Prevent translation of unsupported distinct aggregates and grouping sets. PR fixes `DataSetAggregateRule` to not translate `LogicalAggregate` operators that include distinct aggregates or grouping sets. Such aggregations are currently not supported by `DataSetAggregate`. - [X] General - The pull request references the related JIRA issue ("[FLINK-XXX] Jira title text") - The pull request addresses only one issue - Each commit in the PR has a meaningful commit message (including the JIRA id) - [X] Documentation - Bug fix does not require docs - [X] Tests & Build - Functionality added by the pull request is covered by tests - `mvn clean verify` has been executed successfully locally or a Travis build has passed You can merge this pull request into a Git repository by running: $ git pull https://github.com/fhueske/flink tableDistAggs Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/2014.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2014 commit ed9d68dece81346978564bf4185e97f7a2cf6e74 Author: Fabian HueskeDate: 2016-05-19T22:14:00Z [FLINK-3939] [tableAPI] Prevent translation of unsupported distinct aggregates and grouping sets. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-3909) Maven Failsafe plugin may report SUCCESS on failed tests
[ https://issues.apache.org/jira/browse/FLINK-3909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15292998#comment-15292998 ] ASF GitHub Bot commented on FLINK-3909: --- Github user mxm commented on the pull request: https://github.com/apache/flink/pull/2003#issuecomment-220545613 The Travis builds here also segfault in all RocksDB tests. I'm assuming these test failed silently in the past since I can't make any connection to the test plugin changes. > Maven Failsafe plugin may report SUCCESS on failed tests > > > Key: FLINK-3909 > URL: https://issues.apache.org/jira/browse/FLINK-3909 > Project: Flink > Issue Type: Bug > Components: Build System >Affects Versions: 1.1.0 >Reporter: Maximilian Michels >Assignee: Maximilian Michels > Fix For: 1.1.0 > > > The following build completed successfully on Travis but there are actually > test failures: https://travis-ci.org/apache/flink/jobs/129943398#L5402 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-3909] update Maven Failsafe version
Github user mxm commented on the pull request: https://github.com/apache/flink/pull/2003#issuecomment-220545613 The Travis builds here also segfault in all RocksDB tests. I'm assuming these test failed silently in the past since I can't make any connection to the test plugin changes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-3933) Add an auto-type-extracting DeserializationSchema
[ https://issues.apache.org/jira/browse/FLINK-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15292966#comment-15292966 ] ASF GitHub Bot commented on FLINK-3933: --- Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/2010 > Add an auto-type-extracting DeserializationSchema > - > > Key: FLINK-3933 > URL: https://issues.apache.org/jira/browse/FLINK-3933 > Project: Flink > Issue Type: Bug > Components: Streaming >Affects Versions: 1.0.0 >Reporter: Stephan Ewen >Assignee: Stephan Ewen > Fix For: 1.1.0 > > > When creating a {{DeserializationSchema}}, people need to manually worry > about how to provide the produced type's {{TypeInformation}}. > We should add a base utility {{AbstractDeserializationSchema}} that provides > that automatically via the type extractor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-3933] [streaming API] Add AbstractDeser...
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/2010 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-3650) Add maxBy/minBy to Scala DataSet API
[ https://issues.apache.org/jira/browse/FLINK-3650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15292881#comment-15292881 ] ASF GitHub Bot commented on FLINK-3650: --- Github user ramkrish86 commented on a diff in the pull request: https://github.com/apache/flink/pull/1856#discussion_r6491 --- Diff: flink-java/src/main/java/org/apache/flink/api/java/functions/SelectByMinFunction.java --- @@ -72,8 +73,8 @@ public T reduce(T value1, T value2) throws Exception { for (int position : fields) { // Save position of compared key // Get both values - both implement comparable - Comparable comparable1 = value1.getFieldNotNull(position); - Comparable comparable2 = value2.getFieldNotNull(position); + Comparable comparable1 = ((Tuple)value1).getFieldNotNull(position); --- End diff -- @tillrohrmann - I have done the needful updates and now I can see maxBy and minBy works with scala tuples. It took some time as was busy with other things but some how could find time to complete this. In the process learnt some scala too. Let me know what you think of this last commit. > Add maxBy/minBy to Scala DataSet API > > > Key: FLINK-3650 > URL: https://issues.apache.org/jira/browse/FLINK-3650 > Project: Flink > Issue Type: Improvement > Components: Java API, Scala API >Affects Versions: 1.1.0 >Reporter: Till Rohrmann >Assignee: ramkrishna.s.vasudevan > > The stable Java DataSet API contains the API calls {{maxBy}} and {{minBy}}. > These methods are not supported by the Scala DataSet API. These methods > should be added in order to have a consistent API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: FLINK-3650 Add maxBy/minBy to Scala DataSet AP...
Github user ramkrish86 commented on a diff in the pull request: https://github.com/apache/flink/pull/1856#discussion_r6491 --- Diff: flink-java/src/main/java/org/apache/flink/api/java/functions/SelectByMinFunction.java --- @@ -72,8 +73,8 @@ public T reduce(T value1, T value2) throws Exception { for (int position : fields) { // Save position of compared key // Get both values - both implement comparable - Comparable comparable1 = value1.getFieldNotNull(position); - Comparable comparable2 = value2.getFieldNotNull(position); + Comparable comparable1 = ((Tuple)value1).getFieldNotNull(position); --- End diff -- @tillrohrmann - I have done the needful updates and now I can see maxBy and minBy works with scala tuples. It took some time as was busy with other things but some how could find time to complete this. In the process learnt some scala too. Let me know what you think of this last commit. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---