[GitHub] flink pull request: [FLINK-1731] [ml] Implementation of Feature K-...

2015-06-25 Thread thvasilo
Github user thvasilo commented on the pull request:

https://github.com/apache/flink/pull/700#issuecomment-115133956
  
Thanks, seems like all is fine now. We will start reviewing this in the 
next few days.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [Flink-2030][ml]Online Histogram: Discrete and...

2015-06-25 Thread tillrohrmann
Github user tillrohrmann commented on the pull request:

https://github.com/apache/flink/pull/861#issuecomment-115207592
  
You don't do it. I think it's best at the moment to only make the 
histograms available within the ml package. Everyone who wants to use them, can 
then add `flink-ml` as a dependency.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2230] handling null values for TupleSer...

2015-06-25 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/867#issuecomment-115228662
  
Also, apparently no tests were ever run after these changes. All fail on 
the build server on basic checkstyle rules even.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2230) Add Support for Null-Values in TupleSerializer

2015-06-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601093#comment-14601093
 ] 

ASF GitHub Bot commented on FLINK-2230:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/867#issuecomment-115228662
  
Also, apparently no tests were ever run after these changes. All fail on 
the build server on basic checkstyle rules even.


 Add Support for Null-Values in TupleSerializer
 --

 Key: FLINK-2230
 URL: https://issues.apache.org/jira/browse/FLINK-2230
 Project: Flink
  Issue Type: Sub-task
Reporter: Shiti Saxena
Assignee: Shiti Saxena
Priority: Minor





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2230] handling null values for TupleSer...

2015-06-25 Thread aljoscha
Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/867#issuecomment-115248743
  
@StephanEwen hinted that the best way to go would be to decuple the 
RowTypeInfo completely from the TupleTypeInfo/TupleSerializerBase. This way, we 
get null-value support in the Table API without changing the existing code for 
tuples. This would require changing the RowTypeInfo to no longer be a child of 
CaseClassTypeInfo and creating non-subclass RowSerializer and RowComparator.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2230) Add Support for Null-Values in TupleSerializer

2015-06-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601127#comment-14601127
 ] 

ASF GitHub Bot commented on FLINK-2230:
---

Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/867#issuecomment-115248743
  
@StephanEwen hinted that the best way to go would be to decuple the 
RowTypeInfo completely from the TupleTypeInfo/TupleSerializerBase. This way, we 
get null-value support in the Table API without changing the existing code for 
tuples. This would require changing the RowTypeInfo to no longer be a child of 
CaseClassTypeInfo and creating non-subclass RowSerializer and RowComparator.


 Add Support for Null-Values in TupleSerializer
 --

 Key: FLINK-2230
 URL: https://issues.apache.org/jira/browse/FLINK-2230
 Project: Flink
  Issue Type: Sub-task
Reporter: Shiti Saxena
Assignee: Shiti Saxena
Priority: Minor





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: New operator state interfaces

2015-06-25 Thread gyfora
Github user gyfora commented on a diff in the pull request:

https://github.com/apache/flink/pull/747#discussion_r33256890
  
--- Diff: 
flink-staging/flink-streaming/flink-streaming-core/src/main/java/org/apache/flink/streaming/api/state/PartitionedStreamOperatorState.java
 ---
@@ -0,0 +1,134 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * License); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.streaming.api.state;
+
+import java.io.Serializable;
+import java.util.Map;
+
+import org.apache.flink.api.common.state.OperatorState;
+import org.apache.flink.api.common.state.StateCheckpointer;
+import org.apache.flink.api.java.functions.KeySelector;
+import org.apache.flink.runtime.state.PartitionedStateStore;
+import org.apache.flink.runtime.state.StateHandle;
+import org.apache.flink.runtime.state.StateHandleProvider;
+import org.apache.flink.streaming.api.operators.OneInputStreamOperator;
+
+/**
+ * Implementation of the {@link OperatorState} interface for partitioned 
user
+ * states. It provides methods for checkpointing and restoring partitioned
+ * operator states upon failure.
+ * 
+ * @param IN
+ *Input type of the underlying {@link OneInputStreamOperator}
+ * @param S
+ *Type of the underlying {@link OperatorState}.
+ * @param C
+ *Type of the state snapshot.
+ */
+public class PartitionedStreamOperatorStateIN, S, C extends Serializable 
extends
+   StreamOperatorStateS, C {
+
+   // KeySelector for getting the state partition key for each input
+   private final KeySelectorIN, Serializable keySelector;
+
+   private final PartitionedStateStoreS, C stateStore;
+   
+   private S defaultState;
+
+   // The currently processed input, used to extract the appropriate key
+   private IN currentInput;
+
+   public PartitionedStreamOperatorState(StateCheckpointerS, C 
checkpointer,
+   StateHandleProviderC provider, KeySelectorIN, 
Serializable keySelector) {
+   super(checkpointer, provider);
+   this.keySelector = keySelector;
+   this.stateStore = new EagerStateStoreS, C(checkpointer, 
provider);
+   }
+   
+   @SuppressWarnings(unchecked)
+   public PartitionedStreamOperatorState(StateHandleProviderC provider,
+   KeySelectorIN, Serializable keySelector) {
+   this((StateCheckpointerS, C) new BasicCheckpointer(), 
provider, keySelector);
+   }
+
+   @Override
+   public S getState() {
+   if (currentInput == null) {
+   return null;
+   } else {
+   try {
+   Serializable key = 
keySelector.getKey(currentInput);
+   if(stateStore.containsKey(key)){
+   return stateStore.getStateForKey(key);
+   }else{
+   return defaultState;
+   }
+   } catch (Exception e) {
+   throw new RuntimeException(e);
+   }
+   }
+   }
+
+   @Override
+   public void updateState(S state) {
+   if (state == null) {
+   throw new RuntimeException(Cannot set state to null.);
+   }
+   if (currentInput == null) {
+   throw new RuntimeException(Need a valid input for 
updating a state.);
+   } else {
+   try {
+   
stateStore.setStateForKey(keySelector.getKey(currentInput), state);
+   } catch (Exception e) {
+   throw new RuntimeException(e);
--- End diff --

In this case the exception caught is thrown by the keyselector, which would 
have thrown the same exception in the partitioner at the previous output 
anyways. 

There is no reason for propagating this exception 

[jira] [Commented] (FLINK-2255) In the TopSpeedWindowing examples, every window contains only 1 element, because event time is in millisec, but eviction is in sec

2015-06-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601038#comment-14601038
 ] 

ASF GitHub Bot commented on FLINK-2255:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/857


 In the TopSpeedWindowing examples, every window contains only 1 element, 
 because event time is in millisec, but eviction is in sec
 --

 Key: FLINK-2255
 URL: https://issues.apache.org/jira/browse/FLINK-2255
 Project: Flink
  Issue Type: Bug
  Components: Examples, Streaming
Reporter: Gabor Gevay
Assignee: Gabor Gevay
Priority: Minor

 The event times are generated by System.currentTimeMillis(), so evictionSec 
 should be multiplied by 1000, when passing it to Time.of.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1956) Runtime context not initialized in RichWindowMapFunction

2015-06-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601037#comment-14601037
 ] 

ASF GitHub Bot commented on FLINK-1956:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/855


 Runtime context not initialized in RichWindowMapFunction
 

 Key: FLINK-1956
 URL: https://issues.apache.org/jira/browse/FLINK-1956
 Project: Flink
  Issue Type: Bug
  Components: Streaming
Reporter: Daniel Bali
Assignee: Márton Balassi
  Labels: context, runtime, streaming, window
 Fix For: 0.9


 Trying to access the runtime context in a rich window map function results in 
 an exception. The following snippet demonstrates the bug:
 {code}
 env.generateSequence(0, 1000)
 .window(Count.of(10))
 .mapWindow(new RichWindowMapFunctionLong, Tuple2Long, Long() {
 @Override
 public void mapWindow(IterableLong input, CollectorTuple2Long, 
 Long out) throws Exception {
 long self = getRuntimeContext().getIndexOfThisSubtask();
 for (long value : input) {
 out.collect(new Tuple2(self, value));
 }
 }
 }).flatten().print();
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2255] [streaming] Fixed a bug in TopSpe...

2015-06-25 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/857


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [streaming] Properly forward rich window funct...

2015-06-25 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/855


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [Flink-2030][ml]Online Histogram: Discrete and...

2015-06-25 Thread sachingoel0101
Github user sachingoel0101 commented on the pull request:

https://github.com/apache/flink/pull/861#issuecomment-115210476
  
Okay. So I guess we can leave adding a createHistogram function to 
DataSetUtils for now [It would also require utilizing the FlinkMLTools.block 
for an efficient implementation]. Pending that, this PR is ready to merge then. 
Please have a look for any other modifications that are needed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Closed] (FLINK-2255) In the TopSpeedWindowing examples, every window contains only 1 element, because event time is in millisec, but eviction is in sec

2015-06-25 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/FLINK-2255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Márton Balassi closed FLINK-2255.
-
   Resolution: Fixed
Fix Version/s: 0.10

Fixed via af05b94d0d

 In the TopSpeedWindowing examples, every window contains only 1 element, 
 because event time is in millisec, but eviction is in sec
 --

 Key: FLINK-2255
 URL: https://issues.apache.org/jira/browse/FLINK-2255
 Project: Flink
  Issue Type: Bug
  Components: Examples, Streaming
Reporter: Gabor Gevay
Assignee: Gabor Gevay
Priority: Minor
 Fix For: 0.10


 The event times are generated by System.currentTimeMillis(), so evictionSec 
 should be multiplied by 1000, when passing it to Time.of.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2230] handling null values for TupleSer...

2015-06-25 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/867#issuecomment-115227753
  
We actually had a discussion about this quite a few times. I also raised my 
concerns in the discussion of the issue, to which no one reacted. 

The serialization subsystem (and tuples) are of the most critical nature in 
Flink. There are so many side effects and considerations. Comparators that 
interact with serializers, normalized keys, subclasses and tagging, object 
creation (GC impact). None of that is taken into account here.

For something as crucial as this, we cannot make changes without being 
discussed thoroughly before, and at best, also documented. It makes sense to do 
this before the code writing.

Sorry if I appear like the bad guy here. But we are at the verge of getting 
into spaghetti code and inconsistencies in one of the most crucial parts, and 
we cannot do that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [Flink-2030][ml]Online Histogram: Discrete and...

2015-06-25 Thread tillrohrmann
Github user tillrohrmann commented on the pull request:

https://github.com/apache/flink/pull/861#issuecomment-115202017
  
For the moment, I think it's best to place it under 
`org.apache.flink.ml.density`, for example.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2230) Add Support for Null-Values in TupleSerializer

2015-06-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601062#comment-14601062
 ] 

ASF GitHub Bot commented on FLINK-2230:
---

GitHub user Shiti opened a pull request:

https://github.com/apache/flink/pull/867

[FLINK-2230] handling null values for TupleSerializer

When serializing, we add a `byte[] (BitSet.toByteArray)` before the fields 
which indicates `null` fields. 

When deserializing, we fetch the `byte[]` and obtain the underlying `BitSet 
(BitSet.valueOf)`. `BitSet.get(index)` is `false` when the value is `null`. For 
each field element we check if the its marked as `null` in the `BitSet` and 
then pass it to the fieldSerializer.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Shiti/flink FLINK-2230

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/867.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #867


commit a4d2731eb75032f958e96323a075eb8bc7d11c73
Author: Shiti ssaxena@gmail.com
Date:   2015-06-25T10:36:10Z

[FLINK-2230]handling null values for TupleSerializer




 Add Support for Null-Values in TupleSerializer
 --

 Key: FLINK-2230
 URL: https://issues.apache.org/jira/browse/FLINK-2230
 Project: Flink
  Issue Type: Sub-task
Reporter: Shiti Saxena
Assignee: Shiti Saxena
Priority: Minor





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2230] handling null values for TupleSer...

2015-06-25 Thread Shiti
GitHub user Shiti opened a pull request:

https://github.com/apache/flink/pull/867

[FLINK-2230] handling null values for TupleSerializer

When serializing, we add a `byte[] (BitSet.toByteArray)` before the fields 
which indicates `null` fields. 

When deserializing, we fetch the `byte[]` and obtain the underlying `BitSet 
(BitSet.valueOf)`. `BitSet.get(index)` is `false` when the value is `null`. For 
each field element we check if the its marked as `null` in the `BitSet` and 
then pass it to the fieldSerializer.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Shiti/flink FLINK-2230

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/867.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #867


commit a4d2731eb75032f958e96323a075eb8bc7d11c73
Author: Shiti ssaxena@gmail.com
Date:   2015-06-25T10:36:10Z

[FLINK-2230]handling null values for TupleSerializer




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2230) Add Support for Null-Values in TupleSerializer

2015-06-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601092#comment-14601092
 ] 

ASF GitHub Bot commented on FLINK-2230:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/867#issuecomment-115227753
  
We actually had a discussion about this quite a few times. I also raised my 
concerns in the discussion of the issue, to which no one reacted. 

The serialization subsystem (and tuples) are of the most critical nature in 
Flink. There are so many side effects and considerations. Comparators that 
interact with serializers, normalized keys, subclasses and tagging, object 
creation (GC impact). None of that is taken into account here.

For something as crucial as this, we cannot make changes without being 
discussed thoroughly before, and at best, also documented. It makes sense to do 
this before the code writing.

Sorry if I appear like the bad guy here. But we are at the verge of getting 
into spaghetti code and inconsistencies in one of the most crucial parts, and 
we cannot do that.


 Add Support for Null-Values in TupleSerializer
 --

 Key: FLINK-2230
 URL: https://issues.apache.org/jira/browse/FLINK-2230
 Project: Flink
  Issue Type: Sub-task
Reporter: Shiti Saxena
Assignee: Shiti Saxena
Priority: Minor





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: New operator state interfaces

2015-06-25 Thread StephanEwen
Github user StephanEwen commented on a diff in the pull request:

https://github.com/apache/flink/pull/747#discussion_r33253164
  
--- Diff: 
flink-staging/flink-streaming/flink-streaming-core/src/main/java/org/apache/flink/streaming/api/state/PartitionedStreamOperatorState.java
 ---
@@ -0,0 +1,134 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * License); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.streaming.api.state;
+
+import java.io.Serializable;
+import java.util.Map;
+
+import org.apache.flink.api.common.state.OperatorState;
+import org.apache.flink.api.common.state.StateCheckpointer;
+import org.apache.flink.api.java.functions.KeySelector;
+import org.apache.flink.runtime.state.PartitionedStateStore;
+import org.apache.flink.runtime.state.StateHandle;
+import org.apache.flink.runtime.state.StateHandleProvider;
+import org.apache.flink.streaming.api.operators.OneInputStreamOperator;
+
+/**
+ * Implementation of the {@link OperatorState} interface for partitioned 
user
+ * states. It provides methods for checkpointing and restoring partitioned
+ * operator states upon failure.
+ * 
+ * @param IN
+ *Input type of the underlying {@link OneInputStreamOperator}
+ * @param S
+ *Type of the underlying {@link OperatorState}.
+ * @param C
+ *Type of the state snapshot.
+ */
+public class PartitionedStreamOperatorStateIN, S, C extends Serializable 
extends
+   StreamOperatorStateS, C {
+
+   // KeySelector for getting the state partition key for each input
+   private final KeySelectorIN, Serializable keySelector;
+
+   private final PartitionedStateStoreS, C stateStore;
+   
+   private S defaultState;
+
+   // The currently processed input, used to extract the appropriate key
+   private IN currentInput;
+
+   public PartitionedStreamOperatorState(StateCheckpointerS, C 
checkpointer,
+   StateHandleProviderC provider, KeySelectorIN, 
Serializable keySelector) {
+   super(checkpointer, provider);
+   this.keySelector = keySelector;
+   this.stateStore = new EagerStateStoreS, C(checkpointer, 
provider);
+   }
+   
+   @SuppressWarnings(unchecked)
+   public PartitionedStreamOperatorState(StateHandleProviderC provider,
+   KeySelectorIN, Serializable keySelector) {
+   this((StateCheckpointerS, C) new BasicCheckpointer(), 
provider, keySelector);
+   }
+
+   @Override
+   public S getState() {
+   if (currentInput == null) {
+   return null;
+   } else {
+   try {
+   Serializable key = 
keySelector.getKey(currentInput);
+   if(stateStore.containsKey(key)){
+   return stateStore.getStateForKey(key);
+   }else{
+   return defaultState;
+   }
+   } catch (Exception e) {
+   throw new RuntimeException(e);
+   }
+   }
+   }
+
+   @Override
+   public void updateState(S state) {
+   if (state == null) {
+   throw new RuntimeException(Cannot set state to null.);
+   }
+   if (currentInput == null) {
+   throw new RuntimeException(Need a valid input for 
updating a state.);
+   } else {
+   try {
+   
stateStore.setStateForKey(keySelector.getKey(currentInput), state);
+   } catch (Exception e) {
+   throw new RuntimeException(e);
--- End diff --

Yeah, I think this is not very nice to do. Every level of wrapping just 
makes the exceptions more horrible and the exception messages worse.

This is an indicator that the signature of `updateState()` 

[GitHub] flink pull request: [Flink-2030][ml]Online Histogram: Discrete and...

2015-06-25 Thread sachingoel0101
Github user sachingoel0101 commented on the pull request:

https://github.com/apache/flink/pull/861#issuecomment-115204540
  
 How should I import a class in flink.ml.math from say, flink-java? I tried 
adding flink-staging as a dependency to pom.xml of flink-java but to no avail. 
I'm not terribly familiar with maven.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2093][gelly] Added difference Method

2015-06-25 Thread vasia
Github user vasia commented on the pull request:

https://github.com/apache/flink/pull/818#issuecomment-115215460
  
Thank you @shghatge! I'll merge this :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2271) PageRank gives wrong results with weighted graph input

2015-06-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601137#comment-14601137
 ] 

ASF GitHub Bot commented on FLINK-2271:
---

Github user vasia commented on the pull request:

https://github.com/apache/flink/pull/865#issuecomment-115252595
  
Thank you for looking at this @mxm :) I'll merge.


 PageRank gives wrong results with weighted graph input
 --

 Key: FLINK-2271
 URL: https://issues.apache.org/jira/browse/FLINK-2271
 Project: Flink
  Issue Type: Bug
  Components: Gelly
Affects Versions: 0.9
Reporter: Vasia Kalavri
Assignee: Vasia Kalavri
 Fix For: 0.10


 The current implementation of the PageRank algorithm expects a weighted edge 
 list as input. However, if the edge weight is other than 1.0, this will 
 result in wrong results.
 We should change the library method and corresponding examples (also 
 GSAPageRank) to expect an unweighted graph and compute the transition 
 probabilities correctly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: New operator state interfaces

2015-06-25 Thread StephanEwen
Github user StephanEwen commented on a diff in the pull request:

https://github.com/apache/flink/pull/747#discussion_r33253332
  
--- Diff: 
flink-staging/flink-streaming/flink-streaming-core/src/main/java/org/apache/flink/streaming/api/state/PartitionedStreamOperatorState.java
 ---
@@ -0,0 +1,134 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * License); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.streaming.api.state;
+
+import java.io.Serializable;
+import java.util.Map;
+
+import org.apache.flink.api.common.state.OperatorState;
+import org.apache.flink.api.common.state.StateCheckpointer;
+import org.apache.flink.api.java.functions.KeySelector;
+import org.apache.flink.runtime.state.PartitionedStateStore;
+import org.apache.flink.runtime.state.StateHandle;
+import org.apache.flink.runtime.state.StateHandleProvider;
+import org.apache.flink.streaming.api.operators.OneInputStreamOperator;
+
+/**
+ * Implementation of the {@link OperatorState} interface for partitioned 
user
+ * states. It provides methods for checkpointing and restoring partitioned
+ * operator states upon failure.
+ * 
+ * @param IN
+ *Input type of the underlying {@link OneInputStreamOperator}
+ * @param S
+ *Type of the underlying {@link OperatorState}.
+ * @param C
+ *Type of the state snapshot.
+ */
+public class PartitionedStreamOperatorStateIN, S, C extends Serializable 
extends
+   StreamOperatorStateS, C {
+
+   // KeySelector for getting the state partition key for each input
+   private final KeySelectorIN, Serializable keySelector;
+
+   private final PartitionedStateStoreS, C stateStore;
+   
+   private S defaultState;
+
+   // The currently processed input, used to extract the appropriate key
+   private IN currentInput;
+
+   public PartitionedStreamOperatorState(StateCheckpointerS, C 
checkpointer,
+   StateHandleProviderC provider, KeySelectorIN, 
Serializable keySelector) {
+   super(checkpointer, provider);
+   this.keySelector = keySelector;
+   this.stateStore = new EagerStateStoreS, C(checkpointer, 
provider);
+   }
+   
+   @SuppressWarnings(unchecked)
+   public PartitionedStreamOperatorState(StateHandleProviderC provider,
+   KeySelectorIN, Serializable keySelector) {
+   this((StateCheckpointerS, C) new BasicCheckpointer(), 
provider, keySelector);
+   }
+
+   @Override
+   public S getState() {
+   if (currentInput == null) {
+   return null;
+   } else {
+   try {
+   Serializable key = 
keySelector.getKey(currentInput);
+   if(stateStore.containsKey(key)){
+   return stateStore.getStateForKey(key);
+   }else{
+   return defaultState;
+   }
+   } catch (Exception e) {
+   throw new RuntimeException(e);
+   }
+   }
+   }
+
+   @Override
+   public void updateState(S state) {
+   if (state == null) {
+   throw new RuntimeException(Cannot set state to null.);
+   }
+   if (currentInput == null) {
+   throw new RuntimeException(Need a valid input for 
updating a state.);
+   } else {
+   try {
+   
stateStore.setStateForKey(keySelector.getKey(currentInput), state);
+   } catch (Exception e) {
+   throw new RuntimeException(e);
--- End diff --

It is a good idea to start adding these exceptions to the signatures, and 
use this point to start here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as 

[GitHub] flink pull request: [tools] Make release script a bit more flexibl...

2015-06-25 Thread uce
Github user uce commented on the pull request:

https://github.com/apache/flink/pull/868#issuecomment-115262360
  
Good changes +1

There are some assumptions about the call order of the newly introduced 
functions though (like you have to call prepare make_src_release [or be in the 
checked out repo] in that order). I guess it's fine, we don't want to over 
engineer this ;)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2239) print() on DataSet: stream results and print incrementally

2015-06-25 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601159#comment-14601159
 ] 

Stephan Ewen commented on FLINK-2239:
-

There is pending work to support larger results for {{collect()}} by letting 
them go through the BLOB manager. That is still limited by client memory, 
though.

The concern about direct connections between client and workers is that this 
fails in many enterprise setups due to firewalls. We have seen multiple 
installations with edge servers. The client can communicate with the master, 
but not the workers.

I like the idea of {{iterate()}}. Would be a bit of an effort, but seems like a 
clean solution.

 print() on DataSet: stream results and print incrementally
 --

 Key: FLINK-2239
 URL: https://issues.apache.org/jira/browse/FLINK-2239
 Project: Flink
  Issue Type: Improvement
  Components: Distributed Runtime
Affects Versions: 0.9
Reporter: Maximilian Michels
 Fix For: 0.10


 Users find it counter-intuitive that {{print()}} on a DataSet internally 
 calls {{collect()}} and fully materializes the set. This leads to out of 
 memory errors on the client. It also leaves users with the feeling that Flink 
 cannot handle large amount of data and that it fails frequently.
 To improve on this situation requires some major architectural changes in 
 Flink. The easiest solution would probably be to transfer the data from the 
 job manager to the client via the {{BlobManager}}. Alternatively, the client 
 could directly connect to the task managers and fetch the results. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2093) Add a difference method to Gelly's Graph class

2015-06-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601031#comment-14601031
 ] 

ASF GitHub Bot commented on FLINK-2093:
---

Github user vasia commented on the pull request:

https://github.com/apache/flink/pull/818#issuecomment-115215460
  
Thank you @shghatge! I'll merge this :)


 Add a difference method to Gelly's Graph class
 --

 Key: FLINK-2093
 URL: https://issues.apache.org/jira/browse/FLINK-2093
 Project: Flink
  Issue Type: New Feature
  Components: Gelly
Affects Versions: 0.9
Reporter: Andra Lungu
Assignee: Shivani Ghatge
Priority: Minor

 This method will compute the difference between two graphs, returning a new 
 graph containing the vertices and edges that the current graph and the input 
 graph don't have in common. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [tools] Make release script a bit more flexibl...

2015-06-25 Thread rmetzger
GitHub user rmetzger opened a pull request:

https://github.com/apache/flink/pull/868

[tools] Make release script a bit more flexible



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rmetzger/flink release_script

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/868.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #868


commit 1cefd8abd3a0527f380f22db0baae6bebb2a952f
Author: Robert Metzger rmetz...@apache.org
Date:   2015-06-25T12:58:08Z

[tools] Make release script a bit more flexible




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2264] [gelly] changed the tests to use ...

2015-06-25 Thread vasia
Github user vasia commented on the pull request:

https://github.com/apache/flink/pull/863#issuecomment-115194215
  
Thank you @samk3211! This looks good :)

I see that like here, @mjsax has also created a utils class for the new 
comparison methods in #866.
Since all migrated tests will be using these methods, I will just move them 
to `TestBaseUtils` before merging this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2264) Migrate integration tests for Gelly

2015-06-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14600944#comment-14600944
 ] 

ASF GitHub Bot commented on FLINK-2264:
---

Github user vasia commented on the pull request:

https://github.com/apache/flink/pull/863#issuecomment-115194215
  
Thank you @samk3211! This looks good :)

I see that like here, @mjsax has also created a utils class for the new 
comparison methods in #866.
Since all migrated tests will be using these methods, I will just move them 
to `TestBaseUtils` before merging this.


 Migrate integration tests for Gelly
 ---

 Key: FLINK-2264
 URL: https://issues.apache.org/jira/browse/FLINK-2264
 Project: Flink
  Issue Type: Sub-task
  Components: Gelly, Tests
Affects Versions: 0.10
Reporter: Vasia Kalavri
Assignee: Samia Khalid
Priority: Minor





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2239) print() on DataSet: stream results and print incrementally

2015-06-25 Thread Sebastian Kruse (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14600948#comment-14600948
 ] 

Sebastian Kruse commented on FLINK-2239:


I used the {{RemoteCollectorOutputFormat}} for this purpose and that always 
worked pretty well. However, the downside of it is that it uses Java RMI, which 
is not using Flink's serialization stack and also sometimes requires to set up 
the client address via {{-Djava.rmi.server.hostname}}.

Additionally, I would like to remark that there is a more general issue behind 
this: If one wants to ship larger job results to the driver (e.g., in order to 
write it to a local DB), {{collect()}} also falls flat. Something like an 
{{iterate()}} method would help in such cases, that streams the result to the 
client without materializing it. The proposed change to {{print()}} is then 
just a special instance of such an {{iterate()}} method.

 print() on DataSet: stream results and print incrementally
 --

 Key: FLINK-2239
 URL: https://issues.apache.org/jira/browse/FLINK-2239
 Project: Flink
  Issue Type: Improvement
  Components: Distributed Runtime
Affects Versions: 0.9
Reporter: Maximilian Michels
 Fix For: 0.10


 Users find it counter-intuitive that {{print()}} on a DataSet internally 
 calls {{collect()}} and fully materializes the set. This leads to out of 
 memory errors on the client. It also leaves users with the feeling that Flink 
 cannot handle large amount of data and that it fails frequently.
 To improve on this situation requires some major architectural changes in 
 Flink. The easiest solution would probably be to transfer the data from the 
 job manager to the client via the {{BlobManager}}. Alternatively, the client 
 could directly connect to the task managers and fetch the results. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (FLINK-2108) Add score function for Predictors

2015-06-25 Thread Theodore Vasiloudis (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Theodore Vasiloudis reassigned FLINK-2108:
--

Assignee: Theodore Vasiloudis  (was: Sachin Goel)

 Add score function for Predictors
 -

 Key: FLINK-2108
 URL: https://issues.apache.org/jira/browse/FLINK-2108
 Project: Flink
  Issue Type: Improvement
  Components: Machine Learning Library
Reporter: Theodore Vasiloudis
Assignee: Theodore Vasiloudis
Priority: Minor
  Labels: ML

 A score function for Predictor implementations should take a DataSet[(I, O)] 
 and an (optional) scoring measure and return a score.
 The DataSet[(I, O)] would probably be the output of the predict function.
 For example in MultipleLinearRegression, we can call predict on a labeled 
 dataset, get back predictions for each item in the data, and then call score 
 with the resulting dataset as an argument and we should get back a score for 
 the prediction quality, such as the R^2 score.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2108) Add score function for Predictors

2015-06-25 Thread Theodore Vasiloudis (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14600951#comment-14600951
 ] 

Theodore Vasiloudis commented on FLINK-2108:


OK I will take this then, the interface will be similar to what sklearn uses.

 Add score function for Predictors
 -

 Key: FLINK-2108
 URL: https://issues.apache.org/jira/browse/FLINK-2108
 Project: Flink
  Issue Type: Improvement
  Components: Machine Learning Library
Reporter: Theodore Vasiloudis
Assignee: Sachin Goel
Priority: Minor
  Labels: ML

 A score function for Predictor implementations should take a DataSet[(I, O)] 
 and an (optional) scoring measure and return a score.
 The DataSet[(I, O)] would probably be the output of the predict function.
 For example in MultipleLinearRegression, we can call predict on a labeled 
 dataset, get back predictions for each item in the data, and then call score 
 with the resulting dataset as an argument and we should get back a score for 
 the prediction quality, such as the R^2 score.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-2276) Travis build error

2015-06-25 Thread Sachin Goel (JIRA)
Sachin Goel created FLINK-2276:
--

 Summary: Travis build error
 Key: FLINK-2276
 URL: https://issues.apache.org/jira/browse/FLINK-2276
 Project: Flink
  Issue Type: Bug
Reporter: Sachin Goel


testExecutionFailsAfterTaskMarkedFailed on travis. 
Here is the log output: 
https://s3.amazonaws.com/archive.travis-ci.org/jobs/68288986/log.txt



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2116] [ml] Reusing predict operation fo...

2015-06-25 Thread tillrohrmann
Github user tillrohrmann commented on the pull request:

https://github.com/apache/flink/pull/772#issuecomment-115198124
  
Actually I wouldn't call it predictSomething, because then we're again 
quite close to the former problem that we have a method whose semantics depend 
on the provided type. And this only confuses users. 

My concern is that the user does not really know what `predictLabeled` 
means. Apparently it is something similar to `predict` but with a label. But 
what is the label? Does it mean that I can apply `predict` on `T : Vector` and 
`predictLabeled` on `LabeledVector`? Does it mean that I get a labeled result 
type? But don't I already get it with `predict`? Do I have to provide a type 
with a label or can I also supply a vector? 

IMO, the prediction which also returns the true label value deserves a more 
distinguishable name than `predictSomething`, because it has different 
semantics. I can't think of something better than `evaluate` at the moment. But 
it makes it clear that the user has to provide some evaluation `DataSet`, 
meaning some labeled data.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2116) Make pipeline extension require less coding

2015-06-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14600958#comment-14600958
 ] 

ASF GitHub Bot commented on FLINK-2116:
---

Github user tillrohrmann commented on the pull request:

https://github.com/apache/flink/pull/772#issuecomment-115198124
  
Actually I wouldn't call it predictSomething, because then we're again 
quite close to the former problem that we have a method whose semantics depend 
on the provided type. And this only confuses users. 

My concern is that the user does not really know what `predictLabeled` 
means. Apparently it is something similar to `predict` but with a label. But 
what is the label? Does it mean that I can apply `predict` on `T : Vector` and 
`predictLabeled` on `LabeledVector`? Does it mean that I get a labeled result 
type? But don't I already get it with `predict`? Do I have to provide a type 
with a label or can I also supply a vector? 

IMO, the prediction which also returns the true label value deserves a more 
distinguishable name than `predictSomething`, because it has different 
semantics. I can't think of something better than `evaluate` at the moment. But 
it makes it clear that the user has to provide some evaluation `DataSet`, 
meaning some labeled data.


 Make pipeline extension require less coding
 ---

 Key: FLINK-2116
 URL: https://issues.apache.org/jira/browse/FLINK-2116
 Project: Flink
  Issue Type: Improvement
  Components: Machine Learning Library
Reporter: Mikio Braun
Assignee: Till Rohrmann
Priority: Minor

 Right now, implementing methods from the pipelines for new types, or even 
 adding new methods to pipelines requires many steps:
 1) implementing methods for new types
   implement implicit of the corresponding class encapsulating the operation 
 in the companion object
 2) adding methods to the pipeline
   - adding a method
   - adding a trait for the operation
   - implement implicit in the companion object
 These are all objects which contain many generic parameters, so reducing the 
 work would be great.
 The goal should be that you can really focus on the code to add, and have as 
 little boilerplate code as possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [Flink-2030][ml]Online Histogram: Discrete and...

2015-06-25 Thread sachingoel0101
Github user sachingoel0101 commented on the pull request:

https://github.com/apache/flink/pull/861#issuecomment-115199291
  
Where should I place the Histogram implementations? Currently, they are in 
{{org.apache.flink.ml.math}}, but I can't import them from the flink-core where 
the DataSetUtils is located. Besides, since the purpose is to make the 
Histograms usable in general, they shouldn't be in the ml library.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2093][gelly] Added difference Method

2015-06-25 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/818


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2264] [gelly] changed the tests to use ...

2015-06-25 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/863


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2264) Migrate integration tests for Gelly

2015-06-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601972#comment-14601972
 ] 

ASF GitHub Bot commented on FLINK-2264:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/863


 Migrate integration tests for Gelly
 ---

 Key: FLINK-2264
 URL: https://issues.apache.org/jira/browse/FLINK-2264
 Project: Flink
  Issue Type: Sub-task
  Components: Gelly, Tests
Affects Versions: 0.10
Reporter: Vasia Kalavri
Assignee: Samia Khalid
Priority: Minor





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2093) Add a difference method to Gelly's Graph class

2015-06-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601971#comment-14601971
 ] 

ASF GitHub Bot commented on FLINK-2093:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/818


 Add a difference method to Gelly's Graph class
 --

 Key: FLINK-2093
 URL: https://issues.apache.org/jira/browse/FLINK-2093
 Project: Flink
  Issue Type: New Feature
  Components: Gelly
Affects Versions: 0.9
Reporter: Andra Lungu
Assignee: Shivani Ghatge
Priority: Minor

 This method will compute the difference between two graphs, returning a new 
 graph containing the vertices and edges that the current graph and the input 
 graph don't have in common. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: New operator state interfaces

2015-06-25 Thread gyfora
Github user gyfora commented on a diff in the pull request:

https://github.com/apache/flink/pull/747#discussion_r33259296
  
--- Diff: 
flink-staging/flink-streaming/flink-streaming-core/src/main/java/org/apache/flink/streaming/api/state/PartitionedStreamOperatorState.java
 ---
@@ -0,0 +1,134 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * License); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.streaming.api.state;
+
+import java.io.Serializable;
+import java.util.Map;
+
+import org.apache.flink.api.common.state.OperatorState;
+import org.apache.flink.api.common.state.StateCheckpointer;
+import org.apache.flink.api.java.functions.KeySelector;
+import org.apache.flink.runtime.state.PartitionedStateStore;
+import org.apache.flink.runtime.state.StateHandle;
+import org.apache.flink.runtime.state.StateHandleProvider;
+import org.apache.flink.streaming.api.operators.OneInputStreamOperator;
+
+/**
+ * Implementation of the {@link OperatorState} interface for partitioned 
user
+ * states. It provides methods for checkpointing and restoring partitioned
+ * operator states upon failure.
+ * 
+ * @param IN
+ *Input type of the underlying {@link OneInputStreamOperator}
+ * @param S
+ *Type of the underlying {@link OperatorState}.
+ * @param C
+ *Type of the state snapshot.
+ */
+public class PartitionedStreamOperatorStateIN, S, C extends Serializable 
extends
+   StreamOperatorStateS, C {
+
+   // KeySelector for getting the state partition key for each input
+   private final KeySelectorIN, Serializable keySelector;
+
+   private final PartitionedStateStoreS, C stateStore;
+   
+   private S defaultState;
+
+   // The currently processed input, used to extract the appropriate key
+   private IN currentInput;
+
+   public PartitionedStreamOperatorState(StateCheckpointerS, C 
checkpointer,
+   StateHandleProviderC provider, KeySelectorIN, 
Serializable keySelector) {
+   super(checkpointer, provider);
+   this.keySelector = keySelector;
+   this.stateStore = new EagerStateStoreS, C(checkpointer, 
provider);
+   }
+   
+   @SuppressWarnings(unchecked)
+   public PartitionedStreamOperatorState(StateHandleProviderC provider,
+   KeySelectorIN, Serializable keySelector) {
+   this((StateCheckpointerS, C) new BasicCheckpointer(), 
provider, keySelector);
+   }
+
+   @Override
+   public S getState() {
+   if (currentInput == null) {
+   return null;
+   } else {
+   try {
+   Serializable key = 
keySelector.getKey(currentInput);
+   if(stateStore.containsKey(key)){
+   return stateStore.getStateForKey(key);
+   }else{
+   return defaultState;
+   }
+   } catch (Exception e) {
+   throw new RuntimeException(e);
+   }
+   }
+   }
+
+   @Override
+   public void updateState(S state) {
+   if (state == null) {
+   throw new RuntimeException(Cannot set state to null.);
+   }
+   if (currentInput == null) {
+   throw new RuntimeException(Need a valid input for 
updating a state.);
+   } else {
+   try {
+   
stateStore.setStateForKey(keySelector.getKey(currentInput), state);
+   } catch (Exception e) {
+   throw new RuntimeException(e);
--- End diff --

Then I guess the getState method should throw an IOException as well


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this 

[GitHub] flink pull request: New operator state interfaces

2015-06-25 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/747#issuecomment-115275285
  
I think we have no real blocker here. I would prefer the exception issue 
could be addressed (message for wrapping exception).

Everything else will probably show best when we implement sample jobs and 
sample backends for this new functionality.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2105) Implement Sort-Merge Outer Join algorithm

2015-06-25 Thread Chiwan Park (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601254#comment-14601254
 ] 

Chiwan Park commented on FLINK-2105:


Hi [~r-pogalz],

I think that this issue covers only implementation of Iterators not 
integration. FLINK-687 should cover the integration with Drivers and optimizers.

We need a new Driver because outer-join returns a different result from 
equi-join (MatchDriver). But the Driver is not for sort-merge based outer-join 
only. Hash-based outer-join will use the same Driver. If I understand 
correctly, A Driver returns a same result although the strategy is different.

 Implement Sort-Merge Outer Join algorithm
 -

 Key: FLINK-2105
 URL: https://issues.apache.org/jira/browse/FLINK-2105
 Project: Flink
  Issue Type: Sub-task
  Components: Local Runtime
Reporter: Fabian Hueske
Assignee: Ricky Pogalz
Priority: Minor
 Fix For: pre-apache


 Flink does not natively support outer joins at the moment. 
 This issue proposes to implement a sort-merge outer join algorithm that can 
 cover left, right, and full outer joins.
 The implementation can be based on the regular sort-merge join iterator 
 ({{ReusingMergeMatchIterator}} and {{NonReusingMergeMatchIterator}}, see also 
 {{MatchDriver}} class)
 The Reusing and NonReusing variants differ in whether object instances are 
 reused or new objects are created. I would start with the NonReusing variant 
 which is safer from a user's point of view and should also be easier to 
 implement.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: New operator state interfaces

2015-06-25 Thread gyfora
Github user gyfora commented on a diff in the pull request:

https://github.com/apache/flink/pull/747#discussion_r33259615
  
--- Diff: 
flink-staging/flink-streaming/flink-streaming-core/src/main/java/org/apache/flink/streaming/api/state/PartitionedStreamOperatorState.java
 ---
@@ -0,0 +1,134 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * License); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.streaming.api.state;
+
+import java.io.Serializable;
+import java.util.Map;
+
+import org.apache.flink.api.common.state.OperatorState;
+import org.apache.flink.api.common.state.StateCheckpointer;
+import org.apache.flink.api.java.functions.KeySelector;
+import org.apache.flink.runtime.state.PartitionedStateStore;
+import org.apache.flink.runtime.state.StateHandle;
+import org.apache.flink.runtime.state.StateHandleProvider;
+import org.apache.flink.streaming.api.operators.OneInputStreamOperator;
+
+/**
+ * Implementation of the {@link OperatorState} interface for partitioned 
user
+ * states. It provides methods for checkpointing and restoring partitioned
+ * operator states upon failure.
+ * 
+ * @param IN
+ *Input type of the underlying {@link OneInputStreamOperator}
+ * @param S
+ *Type of the underlying {@link OperatorState}.
+ * @param C
+ *Type of the state snapshot.
+ */
+public class PartitionedStreamOperatorStateIN, S, C extends Serializable 
extends
+   StreamOperatorStateS, C {
+
+   // KeySelector for getting the state partition key for each input
+   private final KeySelectorIN, Serializable keySelector;
+
+   private final PartitionedStateStoreS, C stateStore;
+   
+   private S defaultState;
+
+   // The currently processed input, used to extract the appropriate key
+   private IN currentInput;
+
+   public PartitionedStreamOperatorState(StateCheckpointerS, C 
checkpointer,
+   StateHandleProviderC provider, KeySelectorIN, 
Serializable keySelector) {
+   super(checkpointer, provider);
+   this.keySelector = keySelector;
+   this.stateStore = new EagerStateStoreS, C(checkpointer, 
provider);
+   }
+   
+   @SuppressWarnings(unchecked)
+   public PartitionedStreamOperatorState(StateHandleProviderC provider,
+   KeySelectorIN, Serializable keySelector) {
+   this((StateCheckpointerS, C) new BasicCheckpointer(), 
provider, keySelector);
+   }
+
+   @Override
+   public S getState() {
+   if (currentInput == null) {
+   return null;
+   } else {
+   try {
+   Serializable key = 
keySelector.getKey(currentInput);
+   if(stateStore.containsKey(key)){
+   return stateStore.getStateForKey(key);
+   }else{
+   return defaultState;
+   }
+   } catch (Exception e) {
+   throw new RuntimeException(e);
+   }
+   }
+   }
+
+   @Override
+   public void updateState(S state) {
+   if (state == null) {
+   throw new RuntimeException(Cannot set state to null.);
+   }
+   if (currentInput == null) {
+   throw new RuntimeException(Need a valid input for 
updating a state.);
+   } else {
+   try {
+   
stateStore.setStateForKey(keySelector.getKey(currentInput), state);
+   } catch (Exception e) {
+   throw new RuntimeException(e);
--- End diff --

Okay, good point :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is 

[GitHub] flink pull request: New operator state interfaces

2015-06-25 Thread gyfora
Github user gyfora commented on the pull request:

https://github.com/apache/flink/pull/747#issuecomment-115268784
  
Should we merge this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: New operator state interfaces

2015-06-25 Thread gyfora
Github user gyfora commented on the pull request:

https://github.com/apache/flink/pull/747#issuecomment-115277122
  
Okay I will fix the exceptions and will merge it afterwards


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: New operator state interfaces

2015-06-25 Thread StephanEwen
Github user StephanEwen commented on a diff in the pull request:

https://github.com/apache/flink/pull/747#discussion_r33259429
  
--- Diff: 
flink-staging/flink-streaming/flink-streaming-core/src/main/java/org/apache/flink/streaming/api/state/PartitionedStreamOperatorState.java
 ---
@@ -0,0 +1,134 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * License); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.streaming.api.state;
+
+import java.io.Serializable;
+import java.util.Map;
+
+import org.apache.flink.api.common.state.OperatorState;
+import org.apache.flink.api.common.state.StateCheckpointer;
+import org.apache.flink.api.java.functions.KeySelector;
+import org.apache.flink.runtime.state.PartitionedStateStore;
+import org.apache.flink.runtime.state.StateHandle;
+import org.apache.flink.runtime.state.StateHandleProvider;
+import org.apache.flink.streaming.api.operators.OneInputStreamOperator;
+
+/**
+ * Implementation of the {@link OperatorState} interface for partitioned 
user
+ * states. It provides methods for checkpointing and restoring partitioned
+ * operator states upon failure.
+ * 
+ * @param IN
+ *Input type of the underlying {@link OneInputStreamOperator}
+ * @param S
+ *Type of the underlying {@link OperatorState}.
+ * @param C
+ *Type of the state snapshot.
+ */
+public class PartitionedStreamOperatorStateIN, S, C extends Serializable 
extends
+   StreamOperatorStateS, C {
+
+   // KeySelector for getting the state partition key for each input
+   private final KeySelectorIN, Serializable keySelector;
+
+   private final PartitionedStateStoreS, C stateStore;
+   
+   private S defaultState;
+
+   // The currently processed input, used to extract the appropriate key
+   private IN currentInput;
+
+   public PartitionedStreamOperatorState(StateCheckpointerS, C 
checkpointer,
+   StateHandleProviderC provider, KeySelectorIN, 
Serializable keySelector) {
+   super(checkpointer, provider);
+   this.keySelector = keySelector;
+   this.stateStore = new EagerStateStoreS, C(checkpointer, 
provider);
+   }
+   
+   @SuppressWarnings(unchecked)
+   public PartitionedStreamOperatorState(StateHandleProviderC provider,
+   KeySelectorIN, Serializable keySelector) {
+   this((StateCheckpointerS, C) new BasicCheckpointer(), 
provider, keySelector);
+   }
+
+   @Override
+   public S getState() {
+   if (currentInput == null) {
+   return null;
+   } else {
+   try {
+   Serializable key = 
keySelector.getKey(currentInput);
+   if(stateStore.containsKey(key)){
+   return stateStore.getStateForKey(key);
+   }else{
+   return defaultState;
+   }
+   } catch (Exception e) {
+   throw new RuntimeException(e);
+   }
+   }
+   }
+
+   @Override
+   public void updateState(S state) {
+   if (state == null) {
+   throw new RuntimeException(Cannot set state to null.);
+   }
+   if (currentInput == null) {
+   throw new RuntimeException(Need a valid input for 
updating a state.);
+   } else {
+   try {
+   
stateStore.setStateForKey(keySelector.getKey(currentInput), state);
+   } catch (Exception e) {
+   throw new RuntimeException(e);
--- End diff --

If we want to keep this open, then yes. Really depends on that decision.

On the other hand, removing exceptions usually does not break code. Adding 
them does...


---
If your project is set up for it, you 

[jira] [Resolved] (FLINK-2232) StormWordCountLocalITCase fails

2015-06-25 Thread Matthias J. Sax (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax resolved FLINK-2232.

Resolution: Fixed

 StormWordCountLocalITCase fails
 ---

 Key: FLINK-2232
 URL: https://issues.apache.org/jira/browse/FLINK-2232
 Project: Flink
  Issue Type: Bug
  Components: Streaming, Tests
Affects Versions: master
Reporter: Ufuk Celebi
Assignee: Matthias J. Sax
Priority: Minor

 https://travis-ci.org/apache/flink/jobs/66936476
  
 {code}
 StormWordCountLocalITCaseStreamingProgramTestBase.testJobWithoutObjectReuse:109-postSubmit:40-TestBaseUtils.compareResultsByLinesInMemory:256-TestBaseUtils.compareResultsByLinesInMemory:270
  Different number of lines in expected and obtained result. expected:801 
 but was:0
 {code}
 Can we disable the test until this is fixed?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2066) Make delay between execution retries configurable

2015-06-25 Thread Nuno Miguel Marques dos Santos (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601529#comment-14601529
 ] 

Nuno Miguel Marques dos Santos commented on FLINK-2066:
---

Hi guys.

I am going to start working on this issue.

Any questions I'll be sure to give a shout in the mailing list!

 Make delay between execution retries configurable
 -

 Key: FLINK-2066
 URL: https://issues.apache.org/jira/browse/FLINK-2066
 Project: Flink
  Issue Type: Improvement
  Components: Core
Affects Versions: 0.9
Reporter: Stephan Ewen
  Labels: starter

 Flink allows to specify a delay between execution retries. This helps to let 
 some external failure causes fully manifest themselves before the restart is 
 attempted.
 The delay is currently defined only system wide.
 We should add it to the {{ExecutionConfig}} of a job to allow per-job 
 specification.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [tools] Make release script a bit more flexibl...

2015-06-25 Thread hsaputra
Github user hsaputra commented on the pull request:

https://github.com/apache/flink/pull/868#issuecomment-115366539
  
HI @rmetzger, could you summarize the intention of the PR here? Like what 
is the final goal of the changes?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2163) VertexCentricConfigurationITCase sometimes fails on Travis

2015-06-25 Thread Vasia Kalavri (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14602000#comment-14602000
 ] 

Vasia Kalavri commented on FLINK-2163:
--

This test has now been changed to use collect(). Can we assume that this issue 
is now resolved?

 VertexCentricConfigurationITCase sometimes fails on Travis
 --

 Key: FLINK-2163
 URL: https://issues.apache.org/jira/browse/FLINK-2163
 Project: Flink
  Issue Type: Bug
  Components: Gelly
Reporter: Aljoscha Krettek

 This is the relevant output from the log:
 {code}
 testIterationINDirection[Execution mode = 
 CLUSTER](org.apache.flink.graph.test.VertexCentricConfigurationITCase)  Time 
 elapsed: 0.587 sec   FAILURE!
 java.lang.AssertionError: Different number of lines in expected and obtained 
 result. expected:5 but was:2
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.failNotEquals(Assert.java:743)
   at org.junit.Assert.assertEquals(Assert.java:118)
   at org.junit.Assert.assertEquals(Assert.java:555)
   at 
 org.apache.flink.test.util.TestBaseUtils.compareResultsByLinesInMemory(TestBaseUtils.java:270)
   at 
 org.apache.flink.test.util.TestBaseUtils.compareResultsByLinesInMemory(TestBaseUtils.java:256)
   at 
 org.apache.flink.graph.test.VertexCentricConfigurationITCase.after(VertexCentricConfigurationITCase.java:70)
 Results :
 Failed tests: 
   
 VertexCentricConfigurationITCase.after:70-TestBaseUtils.compareResultsByLinesInMemory:256-TestBaseUtils.compareResultsByLinesInMemory:270
  Different number of lines in expected and obtained result. expected:5 but 
 was:2
 {code}
 https://travis-ci.org/aljoscha/flink/jobs/65403502



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-2264) Migrate integration tests for Gelly

2015-06-25 Thread Vasia Kalavri (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vasia Kalavri resolved FLINK-2264.
--
   Resolution: Fixed
Fix Version/s: 0.10

Congrats on your first contribution [~Samia]!

 Migrate integration tests for Gelly
 ---

 Key: FLINK-2264
 URL: https://issues.apache.org/jira/browse/FLINK-2264
 Project: Flink
  Issue Type: Sub-task
  Components: Gelly, Tests
Affects Versions: 0.10
Reporter: Vasia Kalavri
Assignee: Samia Khalid
Priority: Minor
 Fix For: 0.10






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-1522) Add tests for the library methods and examples

2015-06-25 Thread Vasia Kalavri (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vasia Kalavri resolved FLINK-1522.
--
   Resolution: Fixed
Fix Version/s: 0.10

 Add tests for the library methods and examples
 --

 Key: FLINK-1522
 URL: https://issues.apache.org/jira/browse/FLINK-1522
 Project: Flink
  Issue Type: New Feature
  Components: Gelly
Reporter: Vasia Kalavri
Assignee: Vasia Kalavri
  Labels: easyfix, test
 Fix For: 0.10


 The current tests in gelly test one method at a time. We should have some 
 tests for complete applications. As a start, we could add one test case per 
 example and this way also make sure that our graph library methods actually 
 give correct results.
 I'm assigning this to [~andralungu] because she has already implemented the 
 test for SSSP, but I will help as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-2271) PageRank gives wrong results with weighted graph input

2015-06-25 Thread Vasia Kalavri (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vasia Kalavri resolved FLINK-2271.
--
Resolution: Fixed

 PageRank gives wrong results with weighted graph input
 --

 Key: FLINK-2271
 URL: https://issues.apache.org/jira/browse/FLINK-2271
 Project: Flink
  Issue Type: Bug
  Components: Gelly
Affects Versions: 0.9
Reporter: Vasia Kalavri
Assignee: Vasia Kalavri
 Fix For: 0.10


 The current implementation of the PageRank algorithm expects a weighted edge 
 list as input. However, if the edge weight is other than 1.0, this will 
 result in wrong results.
 We should change the library method and corresponding examples (also 
 GSAPageRank) to expect an unweighted graph and compute the transition 
 probabilities correctly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1731) Add kMeans clustering algorithm to machine learning library

2015-06-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14600798#comment-14600798
 ] 

ASF GitHub Bot commented on FLINK-1731:
---

Github user thvasilo commented on the pull request:

https://github.com/apache/flink/pull/700#issuecomment-115133956
  
Thanks, seems like all is fine now. We will start reviewing this in the 
next few days.


 Add kMeans clustering algorithm to machine learning library
 ---

 Key: FLINK-1731
 URL: https://issues.apache.org/jira/browse/FLINK-1731
 Project: Flink
  Issue Type: New Feature
  Components: Machine Learning Library
Reporter: Till Rohrmann
Assignee: Peter Schrott
  Labels: ML

 The Flink repository already contains a kMeans implementation but it is not 
 yet ported to the machine learning library. I assume that only the used data 
 types have to be adapted and then it can be more or less directly moved to 
 flink-ml.
 The kMeans++ [1] and the kMeans|| [2] algorithm constitute a better 
 implementation because the improve the initial seeding phase to achieve near 
 optimal clustering. It might be worthwhile to implement kMeans||.
 Resources:
 [1] http://ilpubs.stanford.edu:8090/778/1/2006-13.pdf
 [2] http://theory.stanford.edu/~sergei/papers/vldb12-kmpar.pdf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2152] Added zipWithIndex

2015-06-25 Thread andralungu
Github user andralungu commented on the pull request:

https://github.com/apache/flink/pull/832#issuecomment-115134406
  
Hey Theo, 

Thanks a lot for finding my bug there ^^
PR updated to address the Java issues and to contain a  pimped Scala 
version of `zipWithIndex` :) 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2271] [FLINK-1522] [gelly] add missing ...

2015-06-25 Thread mxm
Github user mxm commented on the pull request:

https://github.com/apache/flink/pull/865#issuecomment-115167456
  
Thanks for adding tests. The changes look good to me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2271) PageRank gives wrong results with weighted graph input

2015-06-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14600890#comment-14600890
 ] 

ASF GitHub Bot commented on FLINK-2271:
---

Github user mxm commented on the pull request:

https://github.com/apache/flink/pull/865#issuecomment-115167456
  
Thanks for adding tests. The changes look good to me.


 PageRank gives wrong results with weighted graph input
 --

 Key: FLINK-2271
 URL: https://issues.apache.org/jira/browse/FLINK-2271
 Project: Flink
  Issue Type: Bug
  Components: Gelly
Affects Versions: 0.9
Reporter: Vasia Kalavri
Assignee: Vasia Kalavri
 Fix For: 0.10


 The current implementation of the PageRank algorithm expects a weighted edge 
 list as input. However, if the edge weight is other than 1.0, this will 
 result in wrong results.
 We should change the library method and corresponding examples (also 
 GSAPageRank) to expect an unweighted graph and compute the transition 
 probabilities correctly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2275) Migrate test from package 'org.apache.flink.test.javaApiOperators'

2015-06-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14600905#comment-14600905
 ] 

ASF GitHub Bot commented on FLINK-2275:
---

Github user mjsax commented on the pull request:

https://github.com/apache/flink/pull/866#issuecomment-115173704
  
I adapted all tests except for DataSinkITCase and 
ExecutionEnvironmentITCase.
 - DataSinkITCase - seems to test writing to file explicit; would not make 
sense to change it (tell me, if I am wrong)
 - ExecutionEnvironmentITCase -uses LocalCollectionOutputFormat, and is 
not writing to disc already


 Migrate test from package 'org.apache.flink.test.javaApiOperators'
 --

 Key: FLINK-2275
 URL: https://issues.apache.org/jira/browse/FLINK-2275
 Project: Flink
  Issue Type: Sub-task
  Components: Tests
Affects Versions: 0.10
Reporter: Matthias J. Sax
Assignee: Matthias J. Sax
Priority: Minor





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2275] Migrate test from package 'org.ap...

2015-06-25 Thread mjsax
Github user mjsax commented on the pull request:

https://github.com/apache/flink/pull/866#issuecomment-115173704
  
I adapted all tests except for DataSinkITCase and 
ExecutionEnvironmentITCase.
 - DataSinkITCase - seems to test writing to file explicit; would not make 
sense to change it (tell me, if I am wrong)
 - ExecutionEnvironmentITCase -uses LocalCollectionOutputFormat, and is 
not writing to disc already


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2161] modified Scala shell start script...

2015-06-25 Thread nikste
Github user nikste commented on the pull request:

https://github.com/apache/flink/pull/805#issuecomment-115180597
  
This did not work unfortunately, the class was available in the test, but 
unfortunately not in the shell which is invoked in the test. 
However, if you add the classpath of the external class to 
```settings.classpath.value``` of the scala shell before starting it, it seems 
to work.

I added a test for instantiating and printing a DenseVector with flink-ml 
jar. This should check if the external jar is sent to the cluster.
The only remaining problem is the name of the jar, which will change if the 
flink-version changes. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2161) Flink Scala Shell does not support external jars (e.g. Gelly, FlinkML)

2015-06-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14600917#comment-14600917
 ] 

ASF GitHub Bot commented on FLINK-2161:
---

Github user nikste commented on the pull request:

https://github.com/apache/flink/pull/805#issuecomment-115180597
  
This did not work unfortunately, the class was available in the test, but 
unfortunately not in the shell which is invoked in the test. 
However, if you add the classpath of the external class to 
```settings.classpath.value``` of the scala shell before starting it, it seems 
to work.

I added a test for instantiating and printing a DenseVector with flink-ml 
jar. This should check if the external jar is sent to the cluster.
The only remaining problem is the name of the jar, which will change if the 
flink-version changes. 


 Flink Scala Shell does not support external jars (e.g. Gelly, FlinkML)
 --

 Key: FLINK-2161
 URL: https://issues.apache.org/jira/browse/FLINK-2161
 Project: Flink
  Issue Type: Improvement
Reporter: Till Rohrmann
Assignee: Nikolaas Steenbergen

 Currently, there is no easy way to load and ship external libraries/jars with 
 Flink's Scala shell. Assume that you want to run some Gelly graph algorithms 
 from within the Scala shell, then you have to put the Gelly jar manually in 
 the lib directory and make sure that this jar is also available on your 
 cluster, because it is not shipped with the user code. 
 It would be good to have a simple mechanism how to specify additional jars 
 upon startup of the Scala shell. These jars should then also be shipped to 
 the cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1745) Add exact k-nearest-neighbours algorithm to machine learning library

2015-06-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14600937#comment-14600937
 ] 

ASF GitHub Bot commented on FLINK-1745:
---

Github user thvasilo commented on the pull request:

https://github.com/apache/flink/pull/696#issuecomment-115189878
  
Just a correction, the functionality you will need is in #832 


 Add exact k-nearest-neighbours algorithm to machine learning library
 

 Key: FLINK-1745
 URL: https://issues.apache.org/jira/browse/FLINK-1745
 Project: Flink
  Issue Type: New Feature
  Components: Machine Learning Library
Reporter: Till Rohrmann
Assignee: Till Rohrmann
  Labels: ML, Starter

 Even though the k-nearest-neighbours (kNN) [1,2] algorithm is quite trivial 
 it is still used as a mean to classify data and to do regression. This issue 
 focuses on the implementation of an exact kNN (H-BNLJ, H-BRJ) algorithm as 
 proposed in [2].
 Could be a starter task.
 Resources:
 [1] [http://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm]
 [2] [https://www.cs.utah.edu/~lifeifei/papers/mrknnj.pdf]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1745] [ml] [WIP] Add exact k-nearest-ne...

2015-06-25 Thread thvasilo
Github user thvasilo commented on the pull request:

https://github.com/apache/flink/pull/696#issuecomment-115189878
  
Just a correction, the functionality you will need is in #832 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [Flink-2030][ml]Online Histogram: Discrete and...

2015-06-25 Thread tillrohrmann
Github user tillrohrmann commented on the pull request:

https://github.com/apache/flink/pull/861#issuecomment-115180791
  
The easiest way is probably to check out her branch or the PR and then 
rebase your work on hers.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2105) Implement Sort-Merge Outer Join algorithm

2015-06-25 Thread Ricky Pogalz (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14600934#comment-14600934
 ] 

Ricky Pogalz commented on FLINK-2105:
-

Hi,

first of all thanks for your answers [~chiwanpark] and [~fhueske]. We have some 
more questions regarding the scope of this ticket.

# Is the implementation of the OperatorBase in the core project also part of 
this ticket or should it be part of the integration?
# Same question for the Driver. Is the integration of the Iterators into the 
Driver part of this ticket?
# Just for understanding. Is it sufficient to integrate the OuterJoinIterators 
in the existing MatchDriver or do we have to create a seperate Driver?

Thanks

 Implement Sort-Merge Outer Join algorithm
 -

 Key: FLINK-2105
 URL: https://issues.apache.org/jira/browse/FLINK-2105
 Project: Flink
  Issue Type: Sub-task
  Components: Local Runtime
Reporter: Fabian Hueske
Assignee: Ricky Pogalz
Priority: Minor
 Fix For: pre-apache


 Flink does not natively support outer joins at the moment. 
 This issue proposes to implement a sort-merge outer join algorithm that can 
 cover left, right, and full outer joins.
 The implementation can be based on the regular sort-merge join iterator 
 ({{ReusingMergeMatchIterator}} and {{NonReusingMergeMatchIterator}}, see also 
 {{MatchDriver}} class)
 The Reusing and NonReusing variants differ in whether object instances are 
 reused or new objects are created. I would start with the NonReusing variant 
 which is safer from a user's point of view and should also be easier to 
 implement.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)