[GitHub] flink pull request: [FLINK-1731] [ml] Implementation of Feature K-...
Github user thvasilo commented on the pull request: https://github.com/apache/flink/pull/700#issuecomment-115133956 Thanks, seems like all is fine now. We will start reviewing this in the next few days. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [Flink-2030][ml]Online Histogram: Discrete and...
Github user tillrohrmann commented on the pull request: https://github.com/apache/flink/pull/861#issuecomment-115207592 You don't do it. I think it's best at the moment to only make the histograms available within the ml package. Everyone who wants to use them, can then add `flink-ml` as a dependency. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2230] handling null values for TupleSer...
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/867#issuecomment-115228662 Also, apparently no tests were ever run after these changes. All fail on the build server on basic checkstyle rules even. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2230) Add Support for Null-Values in TupleSerializer
[ https://issues.apache.org/jira/browse/FLINK-2230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601093#comment-14601093 ] ASF GitHub Bot commented on FLINK-2230: --- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/867#issuecomment-115228662 Also, apparently no tests were ever run after these changes. All fail on the build server on basic checkstyle rules even. Add Support for Null-Values in TupleSerializer -- Key: FLINK-2230 URL: https://issues.apache.org/jira/browse/FLINK-2230 Project: Flink Issue Type: Sub-task Reporter: Shiti Saxena Assignee: Shiti Saxena Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2230] handling null values for TupleSer...
Github user aljoscha commented on the pull request: https://github.com/apache/flink/pull/867#issuecomment-115248743 @StephanEwen hinted that the best way to go would be to decuple the RowTypeInfo completely from the TupleTypeInfo/TupleSerializerBase. This way, we get null-value support in the Table API without changing the existing code for tuples. This would require changing the RowTypeInfo to no longer be a child of CaseClassTypeInfo and creating non-subclass RowSerializer and RowComparator. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2230) Add Support for Null-Values in TupleSerializer
[ https://issues.apache.org/jira/browse/FLINK-2230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601127#comment-14601127 ] ASF GitHub Bot commented on FLINK-2230: --- Github user aljoscha commented on the pull request: https://github.com/apache/flink/pull/867#issuecomment-115248743 @StephanEwen hinted that the best way to go would be to decuple the RowTypeInfo completely from the TupleTypeInfo/TupleSerializerBase. This way, we get null-value support in the Table API without changing the existing code for tuples. This would require changing the RowTypeInfo to no longer be a child of CaseClassTypeInfo and creating non-subclass RowSerializer and RowComparator. Add Support for Null-Values in TupleSerializer -- Key: FLINK-2230 URL: https://issues.apache.org/jira/browse/FLINK-2230 Project: Flink Issue Type: Sub-task Reporter: Shiti Saxena Assignee: Shiti Saxena Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: New operator state interfaces
Github user gyfora commented on a diff in the pull request: https://github.com/apache/flink/pull/747#discussion_r33256890 --- Diff: flink-staging/flink-streaming/flink-streaming-core/src/main/java/org/apache/flink/streaming/api/state/PartitionedStreamOperatorState.java --- @@ -0,0 +1,134 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.streaming.api.state; + +import java.io.Serializable; +import java.util.Map; + +import org.apache.flink.api.common.state.OperatorState; +import org.apache.flink.api.common.state.StateCheckpointer; +import org.apache.flink.api.java.functions.KeySelector; +import org.apache.flink.runtime.state.PartitionedStateStore; +import org.apache.flink.runtime.state.StateHandle; +import org.apache.flink.runtime.state.StateHandleProvider; +import org.apache.flink.streaming.api.operators.OneInputStreamOperator; + +/** + * Implementation of the {@link OperatorState} interface for partitioned user + * states. It provides methods for checkpointing and restoring partitioned + * operator states upon failure. + * + * @param IN + *Input type of the underlying {@link OneInputStreamOperator} + * @param S + *Type of the underlying {@link OperatorState}. + * @param C + *Type of the state snapshot. + */ +public class PartitionedStreamOperatorStateIN, S, C extends Serializable extends + StreamOperatorStateS, C { + + // KeySelector for getting the state partition key for each input + private final KeySelectorIN, Serializable keySelector; + + private final PartitionedStateStoreS, C stateStore; + + private S defaultState; + + // The currently processed input, used to extract the appropriate key + private IN currentInput; + + public PartitionedStreamOperatorState(StateCheckpointerS, C checkpointer, + StateHandleProviderC provider, KeySelectorIN, Serializable keySelector) { + super(checkpointer, provider); + this.keySelector = keySelector; + this.stateStore = new EagerStateStoreS, C(checkpointer, provider); + } + + @SuppressWarnings(unchecked) + public PartitionedStreamOperatorState(StateHandleProviderC provider, + KeySelectorIN, Serializable keySelector) { + this((StateCheckpointerS, C) new BasicCheckpointer(), provider, keySelector); + } + + @Override + public S getState() { + if (currentInput == null) { + return null; + } else { + try { + Serializable key = keySelector.getKey(currentInput); + if(stateStore.containsKey(key)){ + return stateStore.getStateForKey(key); + }else{ + return defaultState; + } + } catch (Exception e) { + throw new RuntimeException(e); + } + } + } + + @Override + public void updateState(S state) { + if (state == null) { + throw new RuntimeException(Cannot set state to null.); + } + if (currentInput == null) { + throw new RuntimeException(Need a valid input for updating a state.); + } else { + try { + stateStore.setStateForKey(keySelector.getKey(currentInput), state); + } catch (Exception e) { + throw new RuntimeException(e); --- End diff -- In this case the exception caught is thrown by the keyselector, which would have thrown the same exception in the partitioner at the previous output anyways. There is no reason for propagating this exception
[jira] [Commented] (FLINK-2255) In the TopSpeedWindowing examples, every window contains only 1 element, because event time is in millisec, but eviction is in sec
[ https://issues.apache.org/jira/browse/FLINK-2255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601038#comment-14601038 ] ASF GitHub Bot commented on FLINK-2255: --- Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/857 In the TopSpeedWindowing examples, every window contains only 1 element, because event time is in millisec, but eviction is in sec -- Key: FLINK-2255 URL: https://issues.apache.org/jira/browse/FLINK-2255 Project: Flink Issue Type: Bug Components: Examples, Streaming Reporter: Gabor Gevay Assignee: Gabor Gevay Priority: Minor The event times are generated by System.currentTimeMillis(), so evictionSec should be multiplied by 1000, when passing it to Time.of. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1956) Runtime context not initialized in RichWindowMapFunction
[ https://issues.apache.org/jira/browse/FLINK-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601037#comment-14601037 ] ASF GitHub Bot commented on FLINK-1956: --- Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/855 Runtime context not initialized in RichWindowMapFunction Key: FLINK-1956 URL: https://issues.apache.org/jira/browse/FLINK-1956 Project: Flink Issue Type: Bug Components: Streaming Reporter: Daniel Bali Assignee: Márton Balassi Labels: context, runtime, streaming, window Fix For: 0.9 Trying to access the runtime context in a rich window map function results in an exception. The following snippet demonstrates the bug: {code} env.generateSequence(0, 1000) .window(Count.of(10)) .mapWindow(new RichWindowMapFunctionLong, Tuple2Long, Long() { @Override public void mapWindow(IterableLong input, CollectorTuple2Long, Long out) throws Exception { long self = getRuntimeContext().getIndexOfThisSubtask(); for (long value : input) { out.collect(new Tuple2(self, value)); } } }).flatten().print(); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2255] [streaming] Fixed a bug in TopSpe...
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/857 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [streaming] Properly forward rich window funct...
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/855 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [Flink-2030][ml]Online Histogram: Discrete and...
Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/861#issuecomment-115210476 Okay. So I guess we can leave adding a createHistogram function to DataSetUtils for now [It would also require utilizing the FlinkMLTools.block for an efficient implementation]. Pending that, this PR is ready to merge then. Please have a look for any other modifications that are needed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Closed] (FLINK-2255) In the TopSpeedWindowing examples, every window contains only 1 element, because event time is in millisec, but eviction is in sec
[ https://issues.apache.org/jira/browse/FLINK-2255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Márton Balassi closed FLINK-2255. - Resolution: Fixed Fix Version/s: 0.10 Fixed via af05b94d0d In the TopSpeedWindowing examples, every window contains only 1 element, because event time is in millisec, but eviction is in sec -- Key: FLINK-2255 URL: https://issues.apache.org/jira/browse/FLINK-2255 Project: Flink Issue Type: Bug Components: Examples, Streaming Reporter: Gabor Gevay Assignee: Gabor Gevay Priority: Minor Fix For: 0.10 The event times are generated by System.currentTimeMillis(), so evictionSec should be multiplied by 1000, when passing it to Time.of. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2230] handling null values for TupleSer...
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/867#issuecomment-115227753 We actually had a discussion about this quite a few times. I also raised my concerns in the discussion of the issue, to which no one reacted. The serialization subsystem (and tuples) are of the most critical nature in Flink. There are so many side effects and considerations. Comparators that interact with serializers, normalized keys, subclasses and tagging, object creation (GC impact). None of that is taken into account here. For something as crucial as this, we cannot make changes without being discussed thoroughly before, and at best, also documented. It makes sense to do this before the code writing. Sorry if I appear like the bad guy here. But we are at the verge of getting into spaghetti code and inconsistencies in one of the most crucial parts, and we cannot do that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [Flink-2030][ml]Online Histogram: Discrete and...
Github user tillrohrmann commented on the pull request: https://github.com/apache/flink/pull/861#issuecomment-115202017 For the moment, I think it's best to place it under `org.apache.flink.ml.density`, for example. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2230) Add Support for Null-Values in TupleSerializer
[ https://issues.apache.org/jira/browse/FLINK-2230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601062#comment-14601062 ] ASF GitHub Bot commented on FLINK-2230: --- GitHub user Shiti opened a pull request: https://github.com/apache/flink/pull/867 [FLINK-2230] handling null values for TupleSerializer When serializing, we add a `byte[] (BitSet.toByteArray)` before the fields which indicates `null` fields. When deserializing, we fetch the `byte[]` and obtain the underlying `BitSet (BitSet.valueOf)`. `BitSet.get(index)` is `false` when the value is `null`. For each field element we check if the its marked as `null` in the `BitSet` and then pass it to the fieldSerializer. You can merge this pull request into a Git repository by running: $ git pull https://github.com/Shiti/flink FLINK-2230 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/867.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #867 commit a4d2731eb75032f958e96323a075eb8bc7d11c73 Author: Shiti ssaxena@gmail.com Date: 2015-06-25T10:36:10Z [FLINK-2230]handling null values for TupleSerializer Add Support for Null-Values in TupleSerializer -- Key: FLINK-2230 URL: https://issues.apache.org/jira/browse/FLINK-2230 Project: Flink Issue Type: Sub-task Reporter: Shiti Saxena Assignee: Shiti Saxena Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2230] handling null values for TupleSer...
GitHub user Shiti opened a pull request: https://github.com/apache/flink/pull/867 [FLINK-2230] handling null values for TupleSerializer When serializing, we add a `byte[] (BitSet.toByteArray)` before the fields which indicates `null` fields. When deserializing, we fetch the `byte[]` and obtain the underlying `BitSet (BitSet.valueOf)`. `BitSet.get(index)` is `false` when the value is `null`. For each field element we check if the its marked as `null` in the `BitSet` and then pass it to the fieldSerializer. You can merge this pull request into a Git repository by running: $ git pull https://github.com/Shiti/flink FLINK-2230 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/867.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #867 commit a4d2731eb75032f958e96323a075eb8bc7d11c73 Author: Shiti ssaxena@gmail.com Date: 2015-06-25T10:36:10Z [FLINK-2230]handling null values for TupleSerializer --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2230) Add Support for Null-Values in TupleSerializer
[ https://issues.apache.org/jira/browse/FLINK-2230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601092#comment-14601092 ] ASF GitHub Bot commented on FLINK-2230: --- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/867#issuecomment-115227753 We actually had a discussion about this quite a few times. I also raised my concerns in the discussion of the issue, to which no one reacted. The serialization subsystem (and tuples) are of the most critical nature in Flink. There are so many side effects and considerations. Comparators that interact with serializers, normalized keys, subclasses and tagging, object creation (GC impact). None of that is taken into account here. For something as crucial as this, we cannot make changes without being discussed thoroughly before, and at best, also documented. It makes sense to do this before the code writing. Sorry if I appear like the bad guy here. But we are at the verge of getting into spaghetti code and inconsistencies in one of the most crucial parts, and we cannot do that. Add Support for Null-Values in TupleSerializer -- Key: FLINK-2230 URL: https://issues.apache.org/jira/browse/FLINK-2230 Project: Flink Issue Type: Sub-task Reporter: Shiti Saxena Assignee: Shiti Saxena Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: New operator state interfaces
Github user StephanEwen commented on a diff in the pull request: https://github.com/apache/flink/pull/747#discussion_r33253164 --- Diff: flink-staging/flink-streaming/flink-streaming-core/src/main/java/org/apache/flink/streaming/api/state/PartitionedStreamOperatorState.java --- @@ -0,0 +1,134 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.streaming.api.state; + +import java.io.Serializable; +import java.util.Map; + +import org.apache.flink.api.common.state.OperatorState; +import org.apache.flink.api.common.state.StateCheckpointer; +import org.apache.flink.api.java.functions.KeySelector; +import org.apache.flink.runtime.state.PartitionedStateStore; +import org.apache.flink.runtime.state.StateHandle; +import org.apache.flink.runtime.state.StateHandleProvider; +import org.apache.flink.streaming.api.operators.OneInputStreamOperator; + +/** + * Implementation of the {@link OperatorState} interface for partitioned user + * states. It provides methods for checkpointing and restoring partitioned + * operator states upon failure. + * + * @param IN + *Input type of the underlying {@link OneInputStreamOperator} + * @param S + *Type of the underlying {@link OperatorState}. + * @param C + *Type of the state snapshot. + */ +public class PartitionedStreamOperatorStateIN, S, C extends Serializable extends + StreamOperatorStateS, C { + + // KeySelector for getting the state partition key for each input + private final KeySelectorIN, Serializable keySelector; + + private final PartitionedStateStoreS, C stateStore; + + private S defaultState; + + // The currently processed input, used to extract the appropriate key + private IN currentInput; + + public PartitionedStreamOperatorState(StateCheckpointerS, C checkpointer, + StateHandleProviderC provider, KeySelectorIN, Serializable keySelector) { + super(checkpointer, provider); + this.keySelector = keySelector; + this.stateStore = new EagerStateStoreS, C(checkpointer, provider); + } + + @SuppressWarnings(unchecked) + public PartitionedStreamOperatorState(StateHandleProviderC provider, + KeySelectorIN, Serializable keySelector) { + this((StateCheckpointerS, C) new BasicCheckpointer(), provider, keySelector); + } + + @Override + public S getState() { + if (currentInput == null) { + return null; + } else { + try { + Serializable key = keySelector.getKey(currentInput); + if(stateStore.containsKey(key)){ + return stateStore.getStateForKey(key); + }else{ + return defaultState; + } + } catch (Exception e) { + throw new RuntimeException(e); + } + } + } + + @Override + public void updateState(S state) { + if (state == null) { + throw new RuntimeException(Cannot set state to null.); + } + if (currentInput == null) { + throw new RuntimeException(Need a valid input for updating a state.); + } else { + try { + stateStore.setStateForKey(keySelector.getKey(currentInput), state); + } catch (Exception e) { + throw new RuntimeException(e); --- End diff -- Yeah, I think this is not very nice to do. Every level of wrapping just makes the exceptions more horrible and the exception messages worse. This is an indicator that the signature of `updateState()`
[GitHub] flink pull request: [Flink-2030][ml]Online Histogram: Discrete and...
Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/861#issuecomment-115204540 How should I import a class in flink.ml.math from say, flink-java? I tried adding flink-staging as a dependency to pom.xml of flink-java but to no avail. I'm not terribly familiar with maven. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2093][gelly] Added difference Method
Github user vasia commented on the pull request: https://github.com/apache/flink/pull/818#issuecomment-115215460 Thank you @shghatge! I'll merge this :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2271) PageRank gives wrong results with weighted graph input
[ https://issues.apache.org/jira/browse/FLINK-2271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601137#comment-14601137 ] ASF GitHub Bot commented on FLINK-2271: --- Github user vasia commented on the pull request: https://github.com/apache/flink/pull/865#issuecomment-115252595 Thank you for looking at this @mxm :) I'll merge. PageRank gives wrong results with weighted graph input -- Key: FLINK-2271 URL: https://issues.apache.org/jira/browse/FLINK-2271 Project: Flink Issue Type: Bug Components: Gelly Affects Versions: 0.9 Reporter: Vasia Kalavri Assignee: Vasia Kalavri Fix For: 0.10 The current implementation of the PageRank algorithm expects a weighted edge list as input. However, if the edge weight is other than 1.0, this will result in wrong results. We should change the library method and corresponding examples (also GSAPageRank) to expect an unweighted graph and compute the transition probabilities correctly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: New operator state interfaces
Github user StephanEwen commented on a diff in the pull request: https://github.com/apache/flink/pull/747#discussion_r33253332 --- Diff: flink-staging/flink-streaming/flink-streaming-core/src/main/java/org/apache/flink/streaming/api/state/PartitionedStreamOperatorState.java --- @@ -0,0 +1,134 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.streaming.api.state; + +import java.io.Serializable; +import java.util.Map; + +import org.apache.flink.api.common.state.OperatorState; +import org.apache.flink.api.common.state.StateCheckpointer; +import org.apache.flink.api.java.functions.KeySelector; +import org.apache.flink.runtime.state.PartitionedStateStore; +import org.apache.flink.runtime.state.StateHandle; +import org.apache.flink.runtime.state.StateHandleProvider; +import org.apache.flink.streaming.api.operators.OneInputStreamOperator; + +/** + * Implementation of the {@link OperatorState} interface for partitioned user + * states. It provides methods for checkpointing and restoring partitioned + * operator states upon failure. + * + * @param IN + *Input type of the underlying {@link OneInputStreamOperator} + * @param S + *Type of the underlying {@link OperatorState}. + * @param C + *Type of the state snapshot. + */ +public class PartitionedStreamOperatorStateIN, S, C extends Serializable extends + StreamOperatorStateS, C { + + // KeySelector for getting the state partition key for each input + private final KeySelectorIN, Serializable keySelector; + + private final PartitionedStateStoreS, C stateStore; + + private S defaultState; + + // The currently processed input, used to extract the appropriate key + private IN currentInput; + + public PartitionedStreamOperatorState(StateCheckpointerS, C checkpointer, + StateHandleProviderC provider, KeySelectorIN, Serializable keySelector) { + super(checkpointer, provider); + this.keySelector = keySelector; + this.stateStore = new EagerStateStoreS, C(checkpointer, provider); + } + + @SuppressWarnings(unchecked) + public PartitionedStreamOperatorState(StateHandleProviderC provider, + KeySelectorIN, Serializable keySelector) { + this((StateCheckpointerS, C) new BasicCheckpointer(), provider, keySelector); + } + + @Override + public S getState() { + if (currentInput == null) { + return null; + } else { + try { + Serializable key = keySelector.getKey(currentInput); + if(stateStore.containsKey(key)){ + return stateStore.getStateForKey(key); + }else{ + return defaultState; + } + } catch (Exception e) { + throw new RuntimeException(e); + } + } + } + + @Override + public void updateState(S state) { + if (state == null) { + throw new RuntimeException(Cannot set state to null.); + } + if (currentInput == null) { + throw new RuntimeException(Need a valid input for updating a state.); + } else { + try { + stateStore.setStateForKey(keySelector.getKey(currentInput), state); + } catch (Exception e) { + throw new RuntimeException(e); --- End diff -- It is a good idea to start adding these exceptions to the signatures, and use this point to start here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as
[GitHub] flink pull request: [tools] Make release script a bit more flexibl...
Github user uce commented on the pull request: https://github.com/apache/flink/pull/868#issuecomment-115262360 Good changes +1 There are some assumptions about the call order of the newly introduced functions though (like you have to call prepare make_src_release [or be in the checked out repo] in that order). I guess it's fine, we don't want to over engineer this ;) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2239) print() on DataSet: stream results and print incrementally
[ https://issues.apache.org/jira/browse/FLINK-2239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601159#comment-14601159 ] Stephan Ewen commented on FLINK-2239: - There is pending work to support larger results for {{collect()}} by letting them go through the BLOB manager. That is still limited by client memory, though. The concern about direct connections between client and workers is that this fails in many enterprise setups due to firewalls. We have seen multiple installations with edge servers. The client can communicate with the master, but not the workers. I like the idea of {{iterate()}}. Would be a bit of an effort, but seems like a clean solution. print() on DataSet: stream results and print incrementally -- Key: FLINK-2239 URL: https://issues.apache.org/jira/browse/FLINK-2239 Project: Flink Issue Type: Improvement Components: Distributed Runtime Affects Versions: 0.9 Reporter: Maximilian Michels Fix For: 0.10 Users find it counter-intuitive that {{print()}} on a DataSet internally calls {{collect()}} and fully materializes the set. This leads to out of memory errors on the client. It also leaves users with the feeling that Flink cannot handle large amount of data and that it fails frequently. To improve on this situation requires some major architectural changes in Flink. The easiest solution would probably be to transfer the data from the job manager to the client via the {{BlobManager}}. Alternatively, the client could directly connect to the task managers and fetch the results. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2093) Add a difference method to Gelly's Graph class
[ https://issues.apache.org/jira/browse/FLINK-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601031#comment-14601031 ] ASF GitHub Bot commented on FLINK-2093: --- Github user vasia commented on the pull request: https://github.com/apache/flink/pull/818#issuecomment-115215460 Thank you @shghatge! I'll merge this :) Add a difference method to Gelly's Graph class -- Key: FLINK-2093 URL: https://issues.apache.org/jira/browse/FLINK-2093 Project: Flink Issue Type: New Feature Components: Gelly Affects Versions: 0.9 Reporter: Andra Lungu Assignee: Shivani Ghatge Priority: Minor This method will compute the difference between two graphs, returning a new graph containing the vertices and edges that the current graph and the input graph don't have in common. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [tools] Make release script a bit more flexibl...
GitHub user rmetzger opened a pull request: https://github.com/apache/flink/pull/868 [tools] Make release script a bit more flexible You can merge this pull request into a Git repository by running: $ git pull https://github.com/rmetzger/flink release_script Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/868.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #868 commit 1cefd8abd3a0527f380f22db0baae6bebb2a952f Author: Robert Metzger rmetz...@apache.org Date: 2015-06-25T12:58:08Z [tools] Make release script a bit more flexible --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2264] [gelly] changed the tests to use ...
Github user vasia commented on the pull request: https://github.com/apache/flink/pull/863#issuecomment-115194215 Thank you @samk3211! This looks good :) I see that like here, @mjsax has also created a utils class for the new comparison methods in #866. Since all migrated tests will be using these methods, I will just move them to `TestBaseUtils` before merging this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2264) Migrate integration tests for Gelly
[ https://issues.apache.org/jira/browse/FLINK-2264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14600944#comment-14600944 ] ASF GitHub Bot commented on FLINK-2264: --- Github user vasia commented on the pull request: https://github.com/apache/flink/pull/863#issuecomment-115194215 Thank you @samk3211! This looks good :) I see that like here, @mjsax has also created a utils class for the new comparison methods in #866. Since all migrated tests will be using these methods, I will just move them to `TestBaseUtils` before merging this. Migrate integration tests for Gelly --- Key: FLINK-2264 URL: https://issues.apache.org/jira/browse/FLINK-2264 Project: Flink Issue Type: Sub-task Components: Gelly, Tests Affects Versions: 0.10 Reporter: Vasia Kalavri Assignee: Samia Khalid Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2239) print() on DataSet: stream results and print incrementally
[ https://issues.apache.org/jira/browse/FLINK-2239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14600948#comment-14600948 ] Sebastian Kruse commented on FLINK-2239: I used the {{RemoteCollectorOutputFormat}} for this purpose and that always worked pretty well. However, the downside of it is that it uses Java RMI, which is not using Flink's serialization stack and also sometimes requires to set up the client address via {{-Djava.rmi.server.hostname}}. Additionally, I would like to remark that there is a more general issue behind this: If one wants to ship larger job results to the driver (e.g., in order to write it to a local DB), {{collect()}} also falls flat. Something like an {{iterate()}} method would help in such cases, that streams the result to the client without materializing it. The proposed change to {{print()}} is then just a special instance of such an {{iterate()}} method. print() on DataSet: stream results and print incrementally -- Key: FLINK-2239 URL: https://issues.apache.org/jira/browse/FLINK-2239 Project: Flink Issue Type: Improvement Components: Distributed Runtime Affects Versions: 0.9 Reporter: Maximilian Michels Fix For: 0.10 Users find it counter-intuitive that {{print()}} on a DataSet internally calls {{collect()}} and fully materializes the set. This leads to out of memory errors on the client. It also leaves users with the feeling that Flink cannot handle large amount of data and that it fails frequently. To improve on this situation requires some major architectural changes in Flink. The easiest solution would probably be to transfer the data from the job manager to the client via the {{BlobManager}}. Alternatively, the client could directly connect to the task managers and fetch the results. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (FLINK-2108) Add score function for Predictors
[ https://issues.apache.org/jira/browse/FLINK-2108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Theodore Vasiloudis reassigned FLINK-2108: -- Assignee: Theodore Vasiloudis (was: Sachin Goel) Add score function for Predictors - Key: FLINK-2108 URL: https://issues.apache.org/jira/browse/FLINK-2108 Project: Flink Issue Type: Improvement Components: Machine Learning Library Reporter: Theodore Vasiloudis Assignee: Theodore Vasiloudis Priority: Minor Labels: ML A score function for Predictor implementations should take a DataSet[(I, O)] and an (optional) scoring measure and return a score. The DataSet[(I, O)] would probably be the output of the predict function. For example in MultipleLinearRegression, we can call predict on a labeled dataset, get back predictions for each item in the data, and then call score with the resulting dataset as an argument and we should get back a score for the prediction quality, such as the R^2 score. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2108) Add score function for Predictors
[ https://issues.apache.org/jira/browse/FLINK-2108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14600951#comment-14600951 ] Theodore Vasiloudis commented on FLINK-2108: OK I will take this then, the interface will be similar to what sklearn uses. Add score function for Predictors - Key: FLINK-2108 URL: https://issues.apache.org/jira/browse/FLINK-2108 Project: Flink Issue Type: Improvement Components: Machine Learning Library Reporter: Theodore Vasiloudis Assignee: Sachin Goel Priority: Minor Labels: ML A score function for Predictor implementations should take a DataSet[(I, O)] and an (optional) scoring measure and return a score. The DataSet[(I, O)] would probably be the output of the predict function. For example in MultipleLinearRegression, we can call predict on a labeled dataset, get back predictions for each item in the data, and then call score with the resulting dataset as an argument and we should get back a score for the prediction quality, such as the R^2 score. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-2276) Travis build error
Sachin Goel created FLINK-2276: -- Summary: Travis build error Key: FLINK-2276 URL: https://issues.apache.org/jira/browse/FLINK-2276 Project: Flink Issue Type: Bug Reporter: Sachin Goel testExecutionFailsAfterTaskMarkedFailed on travis. Here is the log output: https://s3.amazonaws.com/archive.travis-ci.org/jobs/68288986/log.txt -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2116] [ml] Reusing predict operation fo...
Github user tillrohrmann commented on the pull request: https://github.com/apache/flink/pull/772#issuecomment-115198124 Actually I wouldn't call it predictSomething, because then we're again quite close to the former problem that we have a method whose semantics depend on the provided type. And this only confuses users. My concern is that the user does not really know what `predictLabeled` means. Apparently it is something similar to `predict` but with a label. But what is the label? Does it mean that I can apply `predict` on `T : Vector` and `predictLabeled` on `LabeledVector`? Does it mean that I get a labeled result type? But don't I already get it with `predict`? Do I have to provide a type with a label or can I also supply a vector? IMO, the prediction which also returns the true label value deserves a more distinguishable name than `predictSomething`, because it has different semantics. I can't think of something better than `evaluate` at the moment. But it makes it clear that the user has to provide some evaluation `DataSet`, meaning some labeled data. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2116) Make pipeline extension require less coding
[ https://issues.apache.org/jira/browse/FLINK-2116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14600958#comment-14600958 ] ASF GitHub Bot commented on FLINK-2116: --- Github user tillrohrmann commented on the pull request: https://github.com/apache/flink/pull/772#issuecomment-115198124 Actually I wouldn't call it predictSomething, because then we're again quite close to the former problem that we have a method whose semantics depend on the provided type. And this only confuses users. My concern is that the user does not really know what `predictLabeled` means. Apparently it is something similar to `predict` but with a label. But what is the label? Does it mean that I can apply `predict` on `T : Vector` and `predictLabeled` on `LabeledVector`? Does it mean that I get a labeled result type? But don't I already get it with `predict`? Do I have to provide a type with a label or can I also supply a vector? IMO, the prediction which also returns the true label value deserves a more distinguishable name than `predictSomething`, because it has different semantics. I can't think of something better than `evaluate` at the moment. But it makes it clear that the user has to provide some evaluation `DataSet`, meaning some labeled data. Make pipeline extension require less coding --- Key: FLINK-2116 URL: https://issues.apache.org/jira/browse/FLINK-2116 Project: Flink Issue Type: Improvement Components: Machine Learning Library Reporter: Mikio Braun Assignee: Till Rohrmann Priority: Minor Right now, implementing methods from the pipelines for new types, or even adding new methods to pipelines requires many steps: 1) implementing methods for new types implement implicit of the corresponding class encapsulating the operation in the companion object 2) adding methods to the pipeline - adding a method - adding a trait for the operation - implement implicit in the companion object These are all objects which contain many generic parameters, so reducing the work would be great. The goal should be that you can really focus on the code to add, and have as little boilerplate code as possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [Flink-2030][ml]Online Histogram: Discrete and...
Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/861#issuecomment-115199291 Where should I place the Histogram implementations? Currently, they are in {{org.apache.flink.ml.math}}, but I can't import them from the flink-core where the DataSetUtils is located. Besides, since the purpose is to make the Histograms usable in general, they shouldn't be in the ml library. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2093][gelly] Added difference Method
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/818 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2264] [gelly] changed the tests to use ...
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/863 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2264) Migrate integration tests for Gelly
[ https://issues.apache.org/jira/browse/FLINK-2264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601972#comment-14601972 ] ASF GitHub Bot commented on FLINK-2264: --- Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/863 Migrate integration tests for Gelly --- Key: FLINK-2264 URL: https://issues.apache.org/jira/browse/FLINK-2264 Project: Flink Issue Type: Sub-task Components: Gelly, Tests Affects Versions: 0.10 Reporter: Vasia Kalavri Assignee: Samia Khalid Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2093) Add a difference method to Gelly's Graph class
[ https://issues.apache.org/jira/browse/FLINK-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601971#comment-14601971 ] ASF GitHub Bot commented on FLINK-2093: --- Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/818 Add a difference method to Gelly's Graph class -- Key: FLINK-2093 URL: https://issues.apache.org/jira/browse/FLINK-2093 Project: Flink Issue Type: New Feature Components: Gelly Affects Versions: 0.9 Reporter: Andra Lungu Assignee: Shivani Ghatge Priority: Minor This method will compute the difference between two graphs, returning a new graph containing the vertices and edges that the current graph and the input graph don't have in common. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: New operator state interfaces
Github user gyfora commented on a diff in the pull request: https://github.com/apache/flink/pull/747#discussion_r33259296 --- Diff: flink-staging/flink-streaming/flink-streaming-core/src/main/java/org/apache/flink/streaming/api/state/PartitionedStreamOperatorState.java --- @@ -0,0 +1,134 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.streaming.api.state; + +import java.io.Serializable; +import java.util.Map; + +import org.apache.flink.api.common.state.OperatorState; +import org.apache.flink.api.common.state.StateCheckpointer; +import org.apache.flink.api.java.functions.KeySelector; +import org.apache.flink.runtime.state.PartitionedStateStore; +import org.apache.flink.runtime.state.StateHandle; +import org.apache.flink.runtime.state.StateHandleProvider; +import org.apache.flink.streaming.api.operators.OneInputStreamOperator; + +/** + * Implementation of the {@link OperatorState} interface for partitioned user + * states. It provides methods for checkpointing and restoring partitioned + * operator states upon failure. + * + * @param IN + *Input type of the underlying {@link OneInputStreamOperator} + * @param S + *Type of the underlying {@link OperatorState}. + * @param C + *Type of the state snapshot. + */ +public class PartitionedStreamOperatorStateIN, S, C extends Serializable extends + StreamOperatorStateS, C { + + // KeySelector for getting the state partition key for each input + private final KeySelectorIN, Serializable keySelector; + + private final PartitionedStateStoreS, C stateStore; + + private S defaultState; + + // The currently processed input, used to extract the appropriate key + private IN currentInput; + + public PartitionedStreamOperatorState(StateCheckpointerS, C checkpointer, + StateHandleProviderC provider, KeySelectorIN, Serializable keySelector) { + super(checkpointer, provider); + this.keySelector = keySelector; + this.stateStore = new EagerStateStoreS, C(checkpointer, provider); + } + + @SuppressWarnings(unchecked) + public PartitionedStreamOperatorState(StateHandleProviderC provider, + KeySelectorIN, Serializable keySelector) { + this((StateCheckpointerS, C) new BasicCheckpointer(), provider, keySelector); + } + + @Override + public S getState() { + if (currentInput == null) { + return null; + } else { + try { + Serializable key = keySelector.getKey(currentInput); + if(stateStore.containsKey(key)){ + return stateStore.getStateForKey(key); + }else{ + return defaultState; + } + } catch (Exception e) { + throw new RuntimeException(e); + } + } + } + + @Override + public void updateState(S state) { + if (state == null) { + throw new RuntimeException(Cannot set state to null.); + } + if (currentInput == null) { + throw new RuntimeException(Need a valid input for updating a state.); + } else { + try { + stateStore.setStateForKey(keySelector.getKey(currentInput), state); + } catch (Exception e) { + throw new RuntimeException(e); --- End diff -- Then I guess the getState method should throw an IOException as well --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this
[GitHub] flink pull request: New operator state interfaces
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/747#issuecomment-115275285 I think we have no real blocker here. I would prefer the exception issue could be addressed (message for wrapping exception). Everything else will probably show best when we implement sample jobs and sample backends for this new functionality. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2105) Implement Sort-Merge Outer Join algorithm
[ https://issues.apache.org/jira/browse/FLINK-2105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601254#comment-14601254 ] Chiwan Park commented on FLINK-2105: Hi [~r-pogalz], I think that this issue covers only implementation of Iterators not integration. FLINK-687 should cover the integration with Drivers and optimizers. We need a new Driver because outer-join returns a different result from equi-join (MatchDriver). But the Driver is not for sort-merge based outer-join only. Hash-based outer-join will use the same Driver. If I understand correctly, A Driver returns a same result although the strategy is different. Implement Sort-Merge Outer Join algorithm - Key: FLINK-2105 URL: https://issues.apache.org/jira/browse/FLINK-2105 Project: Flink Issue Type: Sub-task Components: Local Runtime Reporter: Fabian Hueske Assignee: Ricky Pogalz Priority: Minor Fix For: pre-apache Flink does not natively support outer joins at the moment. This issue proposes to implement a sort-merge outer join algorithm that can cover left, right, and full outer joins. The implementation can be based on the regular sort-merge join iterator ({{ReusingMergeMatchIterator}} and {{NonReusingMergeMatchIterator}}, see also {{MatchDriver}} class) The Reusing and NonReusing variants differ in whether object instances are reused or new objects are created. I would start with the NonReusing variant which is safer from a user's point of view and should also be easier to implement. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: New operator state interfaces
Github user gyfora commented on a diff in the pull request: https://github.com/apache/flink/pull/747#discussion_r33259615 --- Diff: flink-staging/flink-streaming/flink-streaming-core/src/main/java/org/apache/flink/streaming/api/state/PartitionedStreamOperatorState.java --- @@ -0,0 +1,134 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.streaming.api.state; + +import java.io.Serializable; +import java.util.Map; + +import org.apache.flink.api.common.state.OperatorState; +import org.apache.flink.api.common.state.StateCheckpointer; +import org.apache.flink.api.java.functions.KeySelector; +import org.apache.flink.runtime.state.PartitionedStateStore; +import org.apache.flink.runtime.state.StateHandle; +import org.apache.flink.runtime.state.StateHandleProvider; +import org.apache.flink.streaming.api.operators.OneInputStreamOperator; + +/** + * Implementation of the {@link OperatorState} interface for partitioned user + * states. It provides methods for checkpointing and restoring partitioned + * operator states upon failure. + * + * @param IN + *Input type of the underlying {@link OneInputStreamOperator} + * @param S + *Type of the underlying {@link OperatorState}. + * @param C + *Type of the state snapshot. + */ +public class PartitionedStreamOperatorStateIN, S, C extends Serializable extends + StreamOperatorStateS, C { + + // KeySelector for getting the state partition key for each input + private final KeySelectorIN, Serializable keySelector; + + private final PartitionedStateStoreS, C stateStore; + + private S defaultState; + + // The currently processed input, used to extract the appropriate key + private IN currentInput; + + public PartitionedStreamOperatorState(StateCheckpointerS, C checkpointer, + StateHandleProviderC provider, KeySelectorIN, Serializable keySelector) { + super(checkpointer, provider); + this.keySelector = keySelector; + this.stateStore = new EagerStateStoreS, C(checkpointer, provider); + } + + @SuppressWarnings(unchecked) + public PartitionedStreamOperatorState(StateHandleProviderC provider, + KeySelectorIN, Serializable keySelector) { + this((StateCheckpointerS, C) new BasicCheckpointer(), provider, keySelector); + } + + @Override + public S getState() { + if (currentInput == null) { + return null; + } else { + try { + Serializable key = keySelector.getKey(currentInput); + if(stateStore.containsKey(key)){ + return stateStore.getStateForKey(key); + }else{ + return defaultState; + } + } catch (Exception e) { + throw new RuntimeException(e); + } + } + } + + @Override + public void updateState(S state) { + if (state == null) { + throw new RuntimeException(Cannot set state to null.); + } + if (currentInput == null) { + throw new RuntimeException(Need a valid input for updating a state.); + } else { + try { + stateStore.setStateForKey(keySelector.getKey(currentInput), state); + } catch (Exception e) { + throw new RuntimeException(e); --- End diff -- Okay, good point :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is
[GitHub] flink pull request: New operator state interfaces
Github user gyfora commented on the pull request: https://github.com/apache/flink/pull/747#issuecomment-115268784 Should we merge this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: New operator state interfaces
Github user gyfora commented on the pull request: https://github.com/apache/flink/pull/747#issuecomment-115277122 Okay I will fix the exceptions and will merge it afterwards --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: New operator state interfaces
Github user StephanEwen commented on a diff in the pull request: https://github.com/apache/flink/pull/747#discussion_r33259429 --- Diff: flink-staging/flink-streaming/flink-streaming-core/src/main/java/org/apache/flink/streaming/api/state/PartitionedStreamOperatorState.java --- @@ -0,0 +1,134 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.streaming.api.state; + +import java.io.Serializable; +import java.util.Map; + +import org.apache.flink.api.common.state.OperatorState; +import org.apache.flink.api.common.state.StateCheckpointer; +import org.apache.flink.api.java.functions.KeySelector; +import org.apache.flink.runtime.state.PartitionedStateStore; +import org.apache.flink.runtime.state.StateHandle; +import org.apache.flink.runtime.state.StateHandleProvider; +import org.apache.flink.streaming.api.operators.OneInputStreamOperator; + +/** + * Implementation of the {@link OperatorState} interface for partitioned user + * states. It provides methods for checkpointing and restoring partitioned + * operator states upon failure. + * + * @param IN + *Input type of the underlying {@link OneInputStreamOperator} + * @param S + *Type of the underlying {@link OperatorState}. + * @param C + *Type of the state snapshot. + */ +public class PartitionedStreamOperatorStateIN, S, C extends Serializable extends + StreamOperatorStateS, C { + + // KeySelector for getting the state partition key for each input + private final KeySelectorIN, Serializable keySelector; + + private final PartitionedStateStoreS, C stateStore; + + private S defaultState; + + // The currently processed input, used to extract the appropriate key + private IN currentInput; + + public PartitionedStreamOperatorState(StateCheckpointerS, C checkpointer, + StateHandleProviderC provider, KeySelectorIN, Serializable keySelector) { + super(checkpointer, provider); + this.keySelector = keySelector; + this.stateStore = new EagerStateStoreS, C(checkpointer, provider); + } + + @SuppressWarnings(unchecked) + public PartitionedStreamOperatorState(StateHandleProviderC provider, + KeySelectorIN, Serializable keySelector) { + this((StateCheckpointerS, C) new BasicCheckpointer(), provider, keySelector); + } + + @Override + public S getState() { + if (currentInput == null) { + return null; + } else { + try { + Serializable key = keySelector.getKey(currentInput); + if(stateStore.containsKey(key)){ + return stateStore.getStateForKey(key); + }else{ + return defaultState; + } + } catch (Exception e) { + throw new RuntimeException(e); + } + } + } + + @Override + public void updateState(S state) { + if (state == null) { + throw new RuntimeException(Cannot set state to null.); + } + if (currentInput == null) { + throw new RuntimeException(Need a valid input for updating a state.); + } else { + try { + stateStore.setStateForKey(keySelector.getKey(currentInput), state); + } catch (Exception e) { + throw new RuntimeException(e); --- End diff -- If we want to keep this open, then yes. Really depends on that decision. On the other hand, removing exceptions usually does not break code. Adding them does... --- If your project is set up for it, you
[jira] [Resolved] (FLINK-2232) StormWordCountLocalITCase fails
[ https://issues.apache.org/jira/browse/FLINK-2232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias J. Sax resolved FLINK-2232. Resolution: Fixed StormWordCountLocalITCase fails --- Key: FLINK-2232 URL: https://issues.apache.org/jira/browse/FLINK-2232 Project: Flink Issue Type: Bug Components: Streaming, Tests Affects Versions: master Reporter: Ufuk Celebi Assignee: Matthias J. Sax Priority: Minor https://travis-ci.org/apache/flink/jobs/66936476 {code} StormWordCountLocalITCaseStreamingProgramTestBase.testJobWithoutObjectReuse:109-postSubmit:40-TestBaseUtils.compareResultsByLinesInMemory:256-TestBaseUtils.compareResultsByLinesInMemory:270 Different number of lines in expected and obtained result. expected:801 but was:0 {code} Can we disable the test until this is fixed? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2066) Make delay between execution retries configurable
[ https://issues.apache.org/jira/browse/FLINK-2066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601529#comment-14601529 ] Nuno Miguel Marques dos Santos commented on FLINK-2066: --- Hi guys. I am going to start working on this issue. Any questions I'll be sure to give a shout in the mailing list! Make delay between execution retries configurable - Key: FLINK-2066 URL: https://issues.apache.org/jira/browse/FLINK-2066 Project: Flink Issue Type: Improvement Components: Core Affects Versions: 0.9 Reporter: Stephan Ewen Labels: starter Flink allows to specify a delay between execution retries. This helps to let some external failure causes fully manifest themselves before the restart is attempted. The delay is currently defined only system wide. We should add it to the {{ExecutionConfig}} of a job to allow per-job specification. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [tools] Make release script a bit more flexibl...
Github user hsaputra commented on the pull request: https://github.com/apache/flink/pull/868#issuecomment-115366539 HI @rmetzger, could you summarize the intention of the PR here? Like what is the final goal of the changes? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2163) VertexCentricConfigurationITCase sometimes fails on Travis
[ https://issues.apache.org/jira/browse/FLINK-2163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14602000#comment-14602000 ] Vasia Kalavri commented on FLINK-2163: -- This test has now been changed to use collect(). Can we assume that this issue is now resolved? VertexCentricConfigurationITCase sometimes fails on Travis -- Key: FLINK-2163 URL: https://issues.apache.org/jira/browse/FLINK-2163 Project: Flink Issue Type: Bug Components: Gelly Reporter: Aljoscha Krettek This is the relevant output from the log: {code} testIterationINDirection[Execution mode = CLUSTER](org.apache.flink.graph.test.VertexCentricConfigurationITCase) Time elapsed: 0.587 sec FAILURE! java.lang.AssertionError: Different number of lines in expected and obtained result. expected:5 but was:2 at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.apache.flink.test.util.TestBaseUtils.compareResultsByLinesInMemory(TestBaseUtils.java:270) at org.apache.flink.test.util.TestBaseUtils.compareResultsByLinesInMemory(TestBaseUtils.java:256) at org.apache.flink.graph.test.VertexCentricConfigurationITCase.after(VertexCentricConfigurationITCase.java:70) Results : Failed tests: VertexCentricConfigurationITCase.after:70-TestBaseUtils.compareResultsByLinesInMemory:256-TestBaseUtils.compareResultsByLinesInMemory:270 Different number of lines in expected and obtained result. expected:5 but was:2 {code} https://travis-ci.org/aljoscha/flink/jobs/65403502 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (FLINK-2264) Migrate integration tests for Gelly
[ https://issues.apache.org/jira/browse/FLINK-2264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vasia Kalavri resolved FLINK-2264. -- Resolution: Fixed Fix Version/s: 0.10 Congrats on your first contribution [~Samia]! Migrate integration tests for Gelly --- Key: FLINK-2264 URL: https://issues.apache.org/jira/browse/FLINK-2264 Project: Flink Issue Type: Sub-task Components: Gelly, Tests Affects Versions: 0.10 Reporter: Vasia Kalavri Assignee: Samia Khalid Priority: Minor Fix For: 0.10 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (FLINK-1522) Add tests for the library methods and examples
[ https://issues.apache.org/jira/browse/FLINK-1522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vasia Kalavri resolved FLINK-1522. -- Resolution: Fixed Fix Version/s: 0.10 Add tests for the library methods and examples -- Key: FLINK-1522 URL: https://issues.apache.org/jira/browse/FLINK-1522 Project: Flink Issue Type: New Feature Components: Gelly Reporter: Vasia Kalavri Assignee: Vasia Kalavri Labels: easyfix, test Fix For: 0.10 The current tests in gelly test one method at a time. We should have some tests for complete applications. As a start, we could add one test case per example and this way also make sure that our graph library methods actually give correct results. I'm assigning this to [~andralungu] because she has already implemented the test for SSSP, but I will help as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (FLINK-2271) PageRank gives wrong results with weighted graph input
[ https://issues.apache.org/jira/browse/FLINK-2271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vasia Kalavri resolved FLINK-2271. -- Resolution: Fixed PageRank gives wrong results with weighted graph input -- Key: FLINK-2271 URL: https://issues.apache.org/jira/browse/FLINK-2271 Project: Flink Issue Type: Bug Components: Gelly Affects Versions: 0.9 Reporter: Vasia Kalavri Assignee: Vasia Kalavri Fix For: 0.10 The current implementation of the PageRank algorithm expects a weighted edge list as input. However, if the edge weight is other than 1.0, this will result in wrong results. We should change the library method and corresponding examples (also GSAPageRank) to expect an unweighted graph and compute the transition probabilities correctly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1731) Add kMeans clustering algorithm to machine learning library
[ https://issues.apache.org/jira/browse/FLINK-1731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14600798#comment-14600798 ] ASF GitHub Bot commented on FLINK-1731: --- Github user thvasilo commented on the pull request: https://github.com/apache/flink/pull/700#issuecomment-115133956 Thanks, seems like all is fine now. We will start reviewing this in the next few days. Add kMeans clustering algorithm to machine learning library --- Key: FLINK-1731 URL: https://issues.apache.org/jira/browse/FLINK-1731 Project: Flink Issue Type: New Feature Components: Machine Learning Library Reporter: Till Rohrmann Assignee: Peter Schrott Labels: ML The Flink repository already contains a kMeans implementation but it is not yet ported to the machine learning library. I assume that only the used data types have to be adapted and then it can be more or less directly moved to flink-ml. The kMeans++ [1] and the kMeans|| [2] algorithm constitute a better implementation because the improve the initial seeding phase to achieve near optimal clustering. It might be worthwhile to implement kMeans||. Resources: [1] http://ilpubs.stanford.edu:8090/778/1/2006-13.pdf [2] http://theory.stanford.edu/~sergei/papers/vldb12-kmpar.pdf -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2152] Added zipWithIndex
Github user andralungu commented on the pull request: https://github.com/apache/flink/pull/832#issuecomment-115134406 Hey Theo, Thanks a lot for finding my bug there ^^ PR updated to address the Java issues and to contain a pimped Scala version of `zipWithIndex` :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2271] [FLINK-1522] [gelly] add missing ...
Github user mxm commented on the pull request: https://github.com/apache/flink/pull/865#issuecomment-115167456 Thanks for adding tests. The changes look good to me. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2271) PageRank gives wrong results with weighted graph input
[ https://issues.apache.org/jira/browse/FLINK-2271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14600890#comment-14600890 ] ASF GitHub Bot commented on FLINK-2271: --- Github user mxm commented on the pull request: https://github.com/apache/flink/pull/865#issuecomment-115167456 Thanks for adding tests. The changes look good to me. PageRank gives wrong results with weighted graph input -- Key: FLINK-2271 URL: https://issues.apache.org/jira/browse/FLINK-2271 Project: Flink Issue Type: Bug Components: Gelly Affects Versions: 0.9 Reporter: Vasia Kalavri Assignee: Vasia Kalavri Fix For: 0.10 The current implementation of the PageRank algorithm expects a weighted edge list as input. However, if the edge weight is other than 1.0, this will result in wrong results. We should change the library method and corresponding examples (also GSAPageRank) to expect an unweighted graph and compute the transition probabilities correctly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2275) Migrate test from package 'org.apache.flink.test.javaApiOperators'
[ https://issues.apache.org/jira/browse/FLINK-2275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14600905#comment-14600905 ] ASF GitHub Bot commented on FLINK-2275: --- Github user mjsax commented on the pull request: https://github.com/apache/flink/pull/866#issuecomment-115173704 I adapted all tests except for DataSinkITCase and ExecutionEnvironmentITCase. - DataSinkITCase - seems to test writing to file explicit; would not make sense to change it (tell me, if I am wrong) - ExecutionEnvironmentITCase -uses LocalCollectionOutputFormat, and is not writing to disc already Migrate test from package 'org.apache.flink.test.javaApiOperators' -- Key: FLINK-2275 URL: https://issues.apache.org/jira/browse/FLINK-2275 Project: Flink Issue Type: Sub-task Components: Tests Affects Versions: 0.10 Reporter: Matthias J. Sax Assignee: Matthias J. Sax Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2275] Migrate test from package 'org.ap...
Github user mjsax commented on the pull request: https://github.com/apache/flink/pull/866#issuecomment-115173704 I adapted all tests except for DataSinkITCase and ExecutionEnvironmentITCase. - DataSinkITCase - seems to test writing to file explicit; would not make sense to change it (tell me, if I am wrong) - ExecutionEnvironmentITCase -uses LocalCollectionOutputFormat, and is not writing to disc already --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2161] modified Scala shell start script...
Github user nikste commented on the pull request: https://github.com/apache/flink/pull/805#issuecomment-115180597 This did not work unfortunately, the class was available in the test, but unfortunately not in the shell which is invoked in the test. However, if you add the classpath of the external class to ```settings.classpath.value``` of the scala shell before starting it, it seems to work. I added a test for instantiating and printing a DenseVector with flink-ml jar. This should check if the external jar is sent to the cluster. The only remaining problem is the name of the jar, which will change if the flink-version changes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2161) Flink Scala Shell does not support external jars (e.g. Gelly, FlinkML)
[ https://issues.apache.org/jira/browse/FLINK-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14600917#comment-14600917 ] ASF GitHub Bot commented on FLINK-2161: --- Github user nikste commented on the pull request: https://github.com/apache/flink/pull/805#issuecomment-115180597 This did not work unfortunately, the class was available in the test, but unfortunately not in the shell which is invoked in the test. However, if you add the classpath of the external class to ```settings.classpath.value``` of the scala shell before starting it, it seems to work. I added a test for instantiating and printing a DenseVector with flink-ml jar. This should check if the external jar is sent to the cluster. The only remaining problem is the name of the jar, which will change if the flink-version changes. Flink Scala Shell does not support external jars (e.g. Gelly, FlinkML) -- Key: FLINK-2161 URL: https://issues.apache.org/jira/browse/FLINK-2161 Project: Flink Issue Type: Improvement Reporter: Till Rohrmann Assignee: Nikolaas Steenbergen Currently, there is no easy way to load and ship external libraries/jars with Flink's Scala shell. Assume that you want to run some Gelly graph algorithms from within the Scala shell, then you have to put the Gelly jar manually in the lib directory and make sure that this jar is also available on your cluster, because it is not shipped with the user code. It would be good to have a simple mechanism how to specify additional jars upon startup of the Scala shell. These jars should then also be shipped to the cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1745) Add exact k-nearest-neighbours algorithm to machine learning library
[ https://issues.apache.org/jira/browse/FLINK-1745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14600937#comment-14600937 ] ASF GitHub Bot commented on FLINK-1745: --- Github user thvasilo commented on the pull request: https://github.com/apache/flink/pull/696#issuecomment-115189878 Just a correction, the functionality you will need is in #832 Add exact k-nearest-neighbours algorithm to machine learning library Key: FLINK-1745 URL: https://issues.apache.org/jira/browse/FLINK-1745 Project: Flink Issue Type: New Feature Components: Machine Learning Library Reporter: Till Rohrmann Assignee: Till Rohrmann Labels: ML, Starter Even though the k-nearest-neighbours (kNN) [1,2] algorithm is quite trivial it is still used as a mean to classify data and to do regression. This issue focuses on the implementation of an exact kNN (H-BNLJ, H-BRJ) algorithm as proposed in [2]. Could be a starter task. Resources: [1] [http://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm] [2] [https://www.cs.utah.edu/~lifeifei/papers/mrknnj.pdf] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1745] [ml] [WIP] Add exact k-nearest-ne...
Github user thvasilo commented on the pull request: https://github.com/apache/flink/pull/696#issuecomment-115189878 Just a correction, the functionality you will need is in #832 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [Flink-2030][ml]Online Histogram: Discrete and...
Github user tillrohrmann commented on the pull request: https://github.com/apache/flink/pull/861#issuecomment-115180791 The easiest way is probably to check out her branch or the PR and then rebase your work on hers. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2105) Implement Sort-Merge Outer Join algorithm
[ https://issues.apache.org/jira/browse/FLINK-2105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14600934#comment-14600934 ] Ricky Pogalz commented on FLINK-2105: - Hi, first of all thanks for your answers [~chiwanpark] and [~fhueske]. We have some more questions regarding the scope of this ticket. # Is the implementation of the OperatorBase in the core project also part of this ticket or should it be part of the integration? # Same question for the Driver. Is the integration of the Iterators into the Driver part of this ticket? # Just for understanding. Is it sufficient to integrate the OuterJoinIterators in the existing MatchDriver or do we have to create a seperate Driver? Thanks Implement Sort-Merge Outer Join algorithm - Key: FLINK-2105 URL: https://issues.apache.org/jira/browse/FLINK-2105 Project: Flink Issue Type: Sub-task Components: Local Runtime Reporter: Fabian Hueske Assignee: Ricky Pogalz Priority: Minor Fix For: pre-apache Flink does not natively support outer joins at the moment. This issue proposes to implement a sort-merge outer join algorithm that can cover left, right, and full outer joins. The implementation can be based on the regular sort-merge join iterator ({{ReusingMergeMatchIterator}} and {{NonReusingMergeMatchIterator}}, see also {{MatchDriver}} class) The Reusing and NonReusing variants differ in whether object instances are reused or new objects are created. I would start with the NonReusing variant which is safer from a user's point of view and should also be easier to implement. -- This message was sent by Atlassian JIRA (v6.3.4#6332)