date:20150603

[
https://issues.apache.org/jira/browse/FLINK-2098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571638#comment-14571638
]

ASF GitHub Bot commented on FLINK-2098:
---

Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/742#issuecomment-108615858

I thought about what @StephanEwen said about uncheckpointed sources also
having the locking object in the signature of the run() method and also about
extensibility.

We might have to tweak the source interface a little bit more. What I
propose is to have this run method:
```
void run(SourceContext context);
```

Then the source context would have methods to retrieve the locking object
(for checkpointed sources) and for emitting elements. Part of my motivation for
this is that this can be extended in the future without breaking existing
sources. If we introduce proper timestamps at some point we can extend the
SourceContext with a method for emitting elements with a timestamp. Then, if we
want to have watermarks the context can have methods for activating
automatically generated watermarks and for emitting watermarks. And so on...

I think we should fix this now, before the release. What do you think?

Checkpoint barrier initiation at source is not aligned with snapshotting

Key: FLINK-2098
URL: https://issues.apache.org/jira/browse/FLINK-2098
Project: Flink
Issue Type: Bug
Components: Streaming
Affects Versions: 0.9
Reporter: Stephan Ewen
Assignee: Aljoscha Krettek
Priority: Blocker
Fix For: 0.9

The stream source does not properly align the emission of checkpoint barriers
with the drawing of snapshots.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-2098] Ensure checkpoints and element em...

Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/742#issuecomment-108615858
  
I thought about what @StephanEwen  said about uncheckpointed sources also 
having the locking object in the signature of the run() method and also about 
extensibility.

We might have to tweak the source interface a little bit more. What I 
propose is to have this run method:
```
void run(SourceContext context);
```

Then the source context would have methods to retrieve the locking object 
(for checkpointed sources) and for emitting elements. Part of my motivation for 
this is that this can be extended in the future without breaking existing 
sources. If we introduce proper timestamps at some point we can extend the 
SourceContext with a method for emitting elements with a timestamp. Then, if we 
want to have watermarks the context can have methods for activating 
automatically generated watermarks and for emitting watermarks. And so on...

I think we should fix this now, before the release. What do you think?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [wip] [FLINK-2136] Adding DataStream tests for...

Github user mbalassi commented on the pull request:

https://github.com/apache/flink/pull/771#issuecomment-108573493
  
Please fix the scala style issues. :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-2069) writeAsCSV function in DataStream Scala API creates no file


[ 
https://issues.apache.org/jira/browse/FLINK-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571489#comment-14571489
 ] 

ASF GitHub Bot commented on FLINK-2069:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/759


 writeAsCSV function in DataStream Scala API creates no file
 ---

 Key: FLINK-2069
 URL: https://issues.apache.org/jira/browse/FLINK-2069
 Project: Flink
  Issue Type: Bug
  Components: Streaming
Reporter: Faye Beligianni
Priority: Blocker
  Labels: Streaming
 Fix For: 0.9


 When the {{writeAsCSV}} function is used in the DataStream Scala API, no file 
 is created in the specified path.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-2069] Fix Scala CSV Output Format

2015-06-03 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/759


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [hotfix] Remove execute() after print() in Tab...

2015-06-03 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/735


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Created] (FLINK-2154) ActiveTriggerPolicy collects elements to a closed buffer after job finishes

2015-06-03 Thread JIRA

Márton Balassi created FLINK-2154:
-

 Summary: ActiveTriggerPolicy collects elements to a closed buffer 
after job finishes
 Key: FLINK-2154
 URL: https://issues.apache.org/jira/browse/FLINK-2154
 Project: Flink
  Issue Type: Bug
  Components: Streaming
Affects Versions: 0.9
Reporter: Márton Balassi


When gracefully finishing a time windowing job I have witnessed the following 
exceptions. The thread triggering the active policy tries collecting the data, 
even though the buffer pool has been already destroyed as the operator has 
finished.

16:06:44,419 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph   
 - StreamDiscretizer - SlidingTimeGroupedPreReducer - (Filter, ExtractParts) 
(1/4) (a28aa6212bd0c4eca271c133eb86a223) switched from RUNNING to FINISHED
16:06:44,417 INFO  org.apache.flink.runtime.testingUtils.TestingTaskManager 
 - Unregistering task and sending final execution state FINISHED to JobManager 
for task CoFlatMap - Window Flatten (891e684ea2c1d5155b2363057035fca0)
16:06:44,419 INFO  org.apache.flink.runtime.client.JobClient
 - 06/03/2015 16:06:44  StreamDiscretizer - SlidingTimeGroupedPreReducer - 
(Filter, ExtractParts)(1/4) switched to FINISHED 
16:06:44,419 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph   
 - StreamDiscretizer - SlidingTimeGroupedPreReducer - (Filter, ExtractParts) 
(3/4) (2e22e0ba633eb1319adc9b48ed7ff477) switched from RUNNING to FINISHED
16:06:44,420 INFO  org.apache.flink.runtime.client.JobClient
 - 06/03/2015 16:06:44  StreamDiscretizer - SlidingTimeGroupedPreReducer - 
(Filter, ExtractParts)(3/4) switched to FINISHED 
16:06:44,420 INFO  org.apache.flink.runtime.testingUtils.TestingTaskManager 
 - Unregistering task and sending final execution state FINISHED to JobManager 
for task CoFlatMap - Window Flatten (dadf928e00ab1b6f9685bf08e8d447d8)
16:06:44,421 INFO  org.apache.flink.runtime.testingUtils.TestingTaskManager 
 - Unregistering task and sending final execution state FINISHED to JobManager 
for task CoFlatMap - Window Flatten (e211fc565e81fbdaeda0c20702c83fab)
16:06:44,421 INFO  org.apache.flink.runtime.testingUtils.TestingTaskManager 
 - Unregistering task and sending final execution state FINISHED to JobManager 
for task CoFlatMap - Window Flatten (e92b325076707068b32780d30a3355a9)
16:06:44,422 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph   
 - CoFlatMap - Window Flatten (2/4) (891e684ea2c1d5155b2363057035fca0) 
switched from RUNNING to FINISHED
16:06:44,422 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph   
 - CoFlatMap - Window Flatten (1/4) (dadf928e00ab1b6f9685bf08e8d447d8) 
switched from RUNNING to FINISHED
16:06:44,422 INFO  org.apache.flink.runtime.client.JobClient
 - 06/03/2015 16:06:44  CoFlatMap - Window Flatten(2/4) switched to FINISHED 
16:06:44,423 INFO  org.apache.flink.runtime.client.JobClient
 - 06/03/2015 16:06:44  CoFlatMap - Window Flatten(1/4) switched to FINISHED 
16:06:44,423 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph   
 - CoFlatMap - Window Flatten (3/4) (e211fc565e81fbdaeda0c20702c83fab) 
switched from RUNNING to FINISHED
16:06:44,424 INFO  org.apache.flink.runtime.client.JobClient
 - 06/03/2015 16:06:44  CoFlatMap - Window Flatten(3/4) switched to FINISHED 
16:06:44,424 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph   
 - CoFlatMap - Window Flatten (4/4) (e92b325076707068b32780d30a3355a9) 
switched from RUNNING to FINISHED
16:06:44,424 INFO  org.apache.flink.runtime.client.JobClient
 - 06/03/2015 16:06:44  CoFlatMap - Window Flatten(4/4) switched to FINISHED 
16:06:49,151 ERROR org.apache.flink.streaming.api.collector.StreamOutput
 - Emit failed due to: java.lang.IllegalStateException: Buffer pool is 
destroyed.
at 
org.apache.flink.runtime.io.network.buffer.LocalBufferPool.requestBuffer(LocalBufferPool.java:144)
at 
org.apache.flink.runtime.io.network.buffer.LocalBufferPool.requestBufferBlocking(LocalBufferPool.java:133)
at 
org.apache.flink.runtime.io.network.api.writer.RecordWriter.emit(RecordWriter.java:92)
at 
org.apache.flink.streaming.runtime.io.StreamRecordWriter.emit(StreamRecordWriter.java:58)
at 
org.apache.flink.streaming.api.collector.StreamOutput.collect(StreamOutput.java:62)
at 
org.apache.flink.streaming.api.collector.CollectorWrapper.collect(CollectorWrapper.java:40)
at 
org.apache.flink.streaming.api.operators.StreamFilter.processElement(StreamFilter.java:34)
at 
org.apache.flink.streaming.runtime.tasks.OutputHandler$OperatorCollector.collect(OutputHandler.java:232)
at 
org.apache.flink.streaming.api.collector.CollectorWrapper.collect(CollectorWrapper.java:40)
at

[jira] [Commented] (FLINK-2136) Test the streaming scala API


[ 
https://issues.apache.org/jira/browse/FLINK-2136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571484#comment-14571484
 ] 

ASF GitHub Bot commented on FLINK-2136:
---

Github user mbalassi commented on the pull request:

https://github.com/apache/flink/pull/771#issuecomment-108573493
  
Please fix the scala style issues. :)


 Test the streaming scala API
 

 Key: FLINK-2136
 URL: https://issues.apache.org/jira/browse/FLINK-2136
 Project: Flink
  Issue Type: Test
  Components: Scala API, Streaming
Affects Versions: 0.9
Reporter: Márton Balassi
Assignee: Gábor Hermann

 There are no test covering the streaming scala API. I would suggest to test 
 whether the StreamGraph created by a certain operation looks as expected. 
 Deeper layers and runtime should not be tested here, that is done in 
 streaming-core.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-2155) Add an additional checkstyle validation for illegal imports

2015-06-03 Thread Ufuk Celebi (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571577#comment-14571577
 ] 

Ufuk Celebi commented on FLINK-2155:


Thanks! Are there more Validate versions we could exclude? I think lang3 is a 
commons version?

 Add an additional checkstyle validation for illegal imports
 ---

 Key: FLINK-2155
 URL: https://issues.apache.org/jira/browse/FLINK-2155
 Project: Flink
  Issue Type: Improvement
Reporter: Lokesh Rajaram
Assignee: Lokesh Rajaram

 Add an additional check-style validation for illegal imports.
 To begin with the following two package import are marked as illegal:
  1. org.apache.commons.lang3.Validate
  2. org.apache.flink.shaded.*



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-2134] Close Netty channel via CloseRequ...

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/773#issuecomment-108627084
  
Can you elaborate? Why are there backwards events after the connection is 
closed? The iteration head should not close until the iteration terminates, in 
which case there should be no back events any more.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-2134) Deadlock in SuccessAfterNetworkBuffersFailureITCase


[ 
https://issues.apache.org/jira/browse/FLINK-2134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571716#comment-14571716
 ] 

ASF GitHub Bot commented on FLINK-2134:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/773#issuecomment-108627084
  
Can you elaborate? Why are there backwards events after the connection is 
closed? The iteration head should not close until the iteration terminates, in 
which case there should be no back events any more.


 Deadlock in SuccessAfterNetworkBuffersFailureITCase
 ---

 Key: FLINK-2134
 URL: https://issues.apache.org/jira/browse/FLINK-2134
 Project: Flink
  Issue Type: Bug
Affects Versions: master
Reporter: Ufuk Celebi

 I ran into the issue in a Travis run for a PR: 
 https://s3.amazonaws.com/archive.travis-ci.org/jobs/64994288/log.txt
 I can reproduce this locally by running 
 SuccessAfterNetworkBuffersFailureITCase multiple times:
 {code}
 cluster = new ForkableFlinkMiniCluster(config, false);
 for (int i = 0; i  100; i++) {
// run test programs CC, KMeans, CC
 }
 {code}
 The iteration tasks wait for superstep notifications like this:
 {code}
 Join (Join at 
 runConnectedComponents(SuccessAfterNetworkBuffersFailureITCase.java:128)) 
 (8/6) daemon prio=5 tid=0x7f95f374f800 nid=0x138a7 in Object.wait() 
 [0x000123f2a000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
   at java.lang.Object.wait(Native Method)
   - waiting on 0x0007f89e3440 (a java.lang.Object)
   at 
 org.apache.flink.runtime.iterative.concurrent.SuperstepKickoffLatch.awaitStartOfSuperstepOrTermination(SuperstepKickoffLatch.java:57)
   - locked 0x0007f89e3440 (a java.lang.Object)
   at 
 org.apache.flink.runtime.iterative.task.IterationTailPactTask.run(IterationTailPactTask.java:131)
   at 
 org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362)
   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:559)
   at java.lang.Thread.run(Thread.java:745)
 {code}
 I've asked [~rmetzger] to reproduce this and it deadlocks for him as well. 
 The system needs to be under some load for this to occur after multiple runs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-2134) Deadlock in SuccessAfterNetworkBuffersFailureITCase


[ 
https://issues.apache.org/jira/browse/FLINK-2134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571748#comment-14571748
 ] 

ASF GitHub Bot commented on FLINK-2134:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/773#issuecomment-108630865
  
Makes sense.
+1 to get this in!


 Deadlock in SuccessAfterNetworkBuffersFailureITCase
 ---

 Key: FLINK-2134
 URL: https://issues.apache.org/jira/browse/FLINK-2134
 Project: Flink
  Issue Type: Bug
Affects Versions: master
Reporter: Ufuk Celebi

 I ran into the issue in a Travis run for a PR: 
 https://s3.amazonaws.com/archive.travis-ci.org/jobs/64994288/log.txt
 I can reproduce this locally by running 
 SuccessAfterNetworkBuffersFailureITCase multiple times:
 {code}
 cluster = new ForkableFlinkMiniCluster(config, false);
 for (int i = 0; i  100; i++) {
// run test programs CC, KMeans, CC
 }
 {code}
 The iteration tasks wait for superstep notifications like this:
 {code}
 Join (Join at 
 runConnectedComponents(SuccessAfterNetworkBuffersFailureITCase.java:128)) 
 (8/6) daemon prio=5 tid=0x7f95f374f800 nid=0x138a7 in Object.wait() 
 [0x000123f2a000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
   at java.lang.Object.wait(Native Method)
   - waiting on 0x0007f89e3440 (a java.lang.Object)
   at 
 org.apache.flink.runtime.iterative.concurrent.SuperstepKickoffLatch.awaitStartOfSuperstepOrTermination(SuperstepKickoffLatch.java:57)
   - locked 0x0007f89e3440 (a java.lang.Object)
   at 
 org.apache.flink.runtime.iterative.task.IterationTailPactTask.run(IterationTailPactTask.java:131)
   at 
 org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362)
   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:559)
   at java.lang.Thread.run(Thread.java:745)
 {code}
 I've asked [~rmetzger] to reproduce this and it deadlocks for him as well. 
 The system needs to be under some load for this to occur after multiple runs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-2134] Close Netty channel via CloseRequ...

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/773#issuecomment-108630865
  
Makes sense.
+1 to get this in!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: Hits

Github user andralungu commented on a diff in the pull request:

https://github.com/apache/flink/pull/765#discussion_r31677964
  
--- Diff: 
flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/library/HITS.java
 ---
@@ -0,0 +1,180 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * License); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.graph.library;
+import org.apache.flink.api.common.aggregators.LongSumAggregator;
+import org.apache.flink.api.common.functions.MapFunction;
+import org.apache.flink.graph.Edge;
+import org.apache.flink.graph.Graph;
+import org.apache.flink.graph.GraphAlgorithm;
+import org.apache.flink.graph.Vertex;
+import org.apache.flink.graph.spargel.MessageIterator;
+import org.apache.flink.graph.spargel.MessagingFunction;
+import org.apache.flink.graph.spargel.VertexUpdateFunction;
+import org.apache.flink.graph.utils.Hits;
+
+import java.io.Serializable;
+
+
+/**
+ *
+ * This class implements the HITS algorithm by using flink Gelly API
+ *Hyperlink-Induced Topic Search (HITS; also known as hubs and 
authorities) is a link analysis algorithm that rates Web pages,
+ *developed by Jon Kleinberg.
+ *
+ * The algorithm performs a series of iterations, each consisting of two 
basic steps:
+ *
+ * Authority Update: Update each node's Authority score to be equal to the 
sum of the Hub Scores of each node that
+ * points to it.
+ * That is, a node is given a high authority score by being linked from 
pages that are recognized as Hubs for information.
+ * Hub Update: Update each node's Hub Score to be equal to the sum of the 
Authority Scores of each node that it
+ * points to.
+ * That is, a node is given a high hub score by linking to nodes that are 
considered to be authorities on the subject.
+ *
+ * The Hub score and Authority score for a node is calculated with the 
following algorithm:
+ *  *Start with each node having a hub score and authority score of 1.
+ *  *Run the Authority Update Rule
+ *  *Run the Hub Update Rule
+ *  *Normalize the values by dividing each Hub score by square root of the 
sum of the squares of all Hub scores, and
+ *   dividing each Authority score by square root of the sum of the 
squares of all Authority scores.
+ *  *Repeat from the second step as necessary.
+ *
+ * http://en.wikipedia.org/wiki/HITS_algorithm
--- End diff --

I would add an @see annotation for the link here, check the community 
detection library method. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Closed] (FLINK-1907) Scala Interactive Shell


 [ 
https://issues.apache.org/jira/browse/FLINK-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen closed FLINK-1907.
---

 Scala Interactive Shell
 ---

 Key: FLINK-1907
 URL: https://issues.apache.org/jira/browse/FLINK-1907
 Project: Flink
  Issue Type: New Feature
  Components: Scala API
Reporter: Nikolaas Steenbergen
Assignee: Nikolaas Steenbergen
Priority: Minor
 Fix For: 0.9


 Build an interactive Shell for the Scala api.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (FLINK-1907) Scala Interactive Shell


 [ 
https://issues.apache.org/jira/browse/FLINK-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-1907.
-
   Resolution: Implemented
Fix Version/s: 0.9

 Scala Interactive Shell
 ---

 Key: FLINK-1907
 URL: https://issues.apache.org/jira/browse/FLINK-1907
 Project: Flink
  Issue Type: New Feature
  Components: Scala API
Reporter: Nikolaas Steenbergen
Assignee: Nikolaas Steenbergen
Priority: Minor
 Fix For: 0.9


 Build an interactive Shell for the Scala api.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-2098) Checkpoint barrier initiation at source is not aligned with snapshotting


[ 
https://issues.apache.org/jira/browse/FLINK-2098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571672#comment-14571672
 ] 

ASF GitHub Bot commented on FLINK-2098:
---

Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/742#issuecomment-108619295
  
Yes, the change can basically be done by a regex so I also propose merging 
this as early as possible now.

By the way, we could ensure that the source is actually holding the monitor 
lock with `Thread.holdsLock(obj)`. Not sure about the performance impact, 
though.


 Checkpoint barrier initiation at source is not aligned with snapshotting
 

 Key: FLINK-2098
 URL: https://issues.apache.org/jira/browse/FLINK-2098
 Project: Flink
  Issue Type: Bug
  Components: Streaming
Affects Versions: 0.9
Reporter: Stephan Ewen
Assignee: Aljoscha Krettek
Priority: Blocker
 Fix For: 0.9


 The stream source does not properly align the emission of checkpoint barriers 
 with the drawing of snapshots.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-2098] Ensure checkpoints and element em...

Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/742#issuecomment-108619295
  
Yes, the change can basically be done by a regex so I also propose merging 
this as early as possible now.

By the way, we could ensure that the source is actually holding the monitor 
lock with `Thread.holdsLock(obj)`. Not sure about the performance impact, 
though.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: Remove extra HTML tags in TypeInformation Java...

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/766#issuecomment-108619345
  
+1, fair to merge :-)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2098] Ensure checkpoints and element em...

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/742#issuecomment-108618236
  
How about we still merge this now, to make sure we have a good version in 
to start testing? The change you propose is API only, and would not affect 
internals/timings...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-1319) Add static code analysis for UDFs


[ 
https://issues.apache.org/jira/browse/FLINK-1319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571809#comment-14571809
 ] 

ASF GitHub Bot commented on FLINK-1319:
---

Github user twalthr commented on the pull request:

https://github.com/apache/flink/pull/729#issuecomment-108640926
  
Hey Ufuk,

thank you very much for reviewing my code and all others for the feedback! 
I tried to consider all your feedback (I hope I didn't forget anything). I did 
a large refactoring again, added some comments to important parts of the code 
and fixed some bugs. I also added some additional test cases. I hope the PR is 
now ready to be merged (if the build succeeds) :)


 Add static code analysis for UDFs
 -

 Key: FLINK-1319
 URL: https://issues.apache.org/jira/browse/FLINK-1319
 Project: Flink
  Issue Type: New Feature
  Components: Java API, Scala API
Reporter: Stephan Ewen
Assignee: Timo Walther
Priority: Minor

 Flink's Optimizer takes information that tells it for UDFs which fields of 
 the input elements are accessed, modified, or frwarded/copied. This 
 information frequently helps to reuse partitionings, sorts, etc. It may speed 
 up programs significantly, as it can frequently eliminate sorts and shuffles, 
 which are costly.
 Right now, users can add lightweight annotations to UDFs to provide this 
 information (such as adding {{@ConstandFields(0-3, 1, 2-1)}}.
 We worked with static code analysis of UDFs before, to determine this 
 information automatically. This is an incredible feature, as it magically 
 makes programs faster.
 For record-at-a-time operations (Map, Reduce, FlatMap, Join, Cross), this 
 works surprisingly well in many cases. We used the Soot toolkit for the 
 static code analysis. Unfortunately, Soot is LGPL licensed and thus we did 
 not include any of the code so far.
 I propose to add this functionality to Flink, in the form of a drop-in 
 addition, to work around the LGPL incompatibility with ALS 2.0. Users could 
 simply download a special flink-code-analysis.jar and drop it into the 
 lib folder to enable this functionality. We may even add a script to 
 tools that downloads that library automatically into the lib folder. This 
 should be legally fine, since we do not redistribute LGPL code and only 
 dynamically link it (the incompatibility with ASL 2.0 is mainly in the 
 patentability, if I remember correctly).
 Prior work on this has been done by [~aljoscha] and [~skunert], which could 
 provide a code base to start with.
 *Appendix*
 Hompage to Soot static analysis toolkit: http://www.sable.mcgill.ca/soot/
 Papers on static analysis and for optimization: 
 http://stratosphere.eu/assets/papers/EnablingOperatorReorderingSCA_12.pdf and 
 http://stratosphere.eu/assets/papers/openingTheBlackBoxes_12.pdf
 Quick introduction to the Optimizer: 
 http://stratosphere.eu/assets/papers/2014-VLDBJ_Stratosphere_Overview.pdf 
 (Section 6)
 Optimizer for Iterations: 
 http://stratosphere.eu/assets/papers/spinningFastIterativeDataFlows_12.pdf 
 (Sections 4.3 and 5.3)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (FLINK-2070) Confusing methods print() that print on client vs on TaskManager


 [ 
https://issues.apache.org/jira/browse/FLINK-2070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-2070.
-
Resolution: Fixed
  Assignee: Stephan Ewen

Fixed via 11643c0cc79eabe02e952e6fbd56d7a55166b623

 Confusing methods print() that print on client vs on TaskManager
 

 Key: FLINK-2070
 URL: https://issues.apache.org/jira/browse/FLINK-2070
 Project: Flink
  Issue Type: Bug
  Components: Core
Affects Versions: 0.9
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 0.9


 With the {{print()}} method printing on the client, the 
 {{print(sourceIentified)}} method becomes confusing, as it has the same name, 
 but prints on the taskManager (into its out files) and executes lazily.
 We should clarify the confusion by picking a more descriptive name, like 
 {{printOnTaskManager()}}.
 I am not sure how common the use case to print into the TaskManager {{out}} 
 files is. We could remove that method and point to the 
 {{PrintingOutputFormat}} for that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Closed] (FLINK-2070) Confusing methods print() that print on client vs on TaskManager


 [ 
https://issues.apache.org/jira/browse/FLINK-2070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen closed FLINK-2070.
---

 Confusing methods print() that print on client vs on TaskManager
 

 Key: FLINK-2070
 URL: https://issues.apache.org/jira/browse/FLINK-2070
 Project: Flink
  Issue Type: Bug
  Components: Core
Affects Versions: 0.9
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 0.9


 With the {{print()}} method printing on the client, the 
 {{print(sourceIentified)}} method becomes confusing, as it has the same name, 
 but prints on the taskManager (into its out files) and executes lazily.
 We should clarify the confusion by picking a more descriptive name, like 
 {{printOnTaskManager()}}.
 I am not sure how common the use case to print into the TaskManager {{out}} 
 files is. We could remove that method and point to the 
 {{PrintingOutputFormat}} for that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-2151) Provide interface to distinguish close() calls in error and regular cases


[ 
https://issues.apache.org/jira/browse/FLINK-2151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571704#comment-14571704
 ] 

Stephan Ewen commented on FLINK-2151:
-

For rich functions, we can add a {{signalCancel()}} methods. That breaks 
nothing, because they are regular classes.

For data sinks, it is more tricky. We can add a {{CancelableSink}}

 Provide interface to distinguish close() calls in error and regular cases
 -

 Key: FLINK-2151
 URL: https://issues.apache.org/jira/browse/FLINK-2151
 Project: Flink
  Issue Type: Improvement
  Components: Local Runtime
Affects Versions: 0.9
Reporter: Robert Metzger

 I was talking to somebody who is interested in contributing a 
 {{flink-cassandra}} connector.
 The connector will create cassandra files locally (on the TaskManagers) and 
 bulk-load them in the {{close()}} method.
 For the user functions it is currently not possible to find out whether the 
 function is closed due to an error or an regular end.
 The simplest approach would be passing an additional argument (enum or 
 boolean) into the close() method, indicating the type of closing.
 But that would break all existing code.
 Another approach would add an interface that has such an extended close 
 method {{RichCloseFunction}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-2134) Deadlock in SuccessAfterNetworkBuffersFailureITCase


[ 
https://issues.apache.org/jira/browse/FLINK-2134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571730#comment-14571730
 ] 

ASF GitHub Bot commented on FLINK-2134:
---

Github user uce commented on the pull request:

https://github.com/apache/flink/pull/773#issuecomment-108628742
  
No, there are no backwards events *after* the channel is closed. The sync 
sends out the backwards events, then closes. But the close could overtake the 
unflushed backwards events. This lead to a deadlock, because the head was 
waiting on termination events, which never arrived.


 Deadlock in SuccessAfterNetworkBuffersFailureITCase
 ---

 Key: FLINK-2134
 URL: https://issues.apache.org/jira/browse/FLINK-2134
 Project: Flink
  Issue Type: Bug
Affects Versions: master
Reporter: Ufuk Celebi

 I ran into the issue in a Travis run for a PR: 
 https://s3.amazonaws.com/archive.travis-ci.org/jobs/64994288/log.txt
 I can reproduce this locally by running 
 SuccessAfterNetworkBuffersFailureITCase multiple times:
 {code}
 cluster = new ForkableFlinkMiniCluster(config, false);
 for (int i = 0; i  100; i++) {
// run test programs CC, KMeans, CC
 }
 {code}
 The iteration tasks wait for superstep notifications like this:
 {code}
 Join (Join at 
 runConnectedComponents(SuccessAfterNetworkBuffersFailureITCase.java:128)) 
 (8/6) daemon prio=5 tid=0x7f95f374f800 nid=0x138a7 in Object.wait() 
 [0x000123f2a000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
   at java.lang.Object.wait(Native Method)
   - waiting on 0x0007f89e3440 (a java.lang.Object)
   at 
 org.apache.flink.runtime.iterative.concurrent.SuperstepKickoffLatch.awaitStartOfSuperstepOrTermination(SuperstepKickoffLatch.java:57)
   - locked 0x0007f89e3440 (a java.lang.Object)
   at 
 org.apache.flink.runtime.iterative.task.IterationTailPactTask.run(IterationTailPactTask.java:131)
   at 
 org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362)
   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:559)
   at java.lang.Thread.run(Thread.java:745)
 {code}
 I've asked [~rmetzger] to reproduce this and it deadlocks for him as well. 
 The system needs to be under some load for this to occur after multiple runs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-2098] Ensure checkpoints and element em...

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/742#issuecomment-108627853
  
I was thinking the same thing, about `Thread.holdsLock(obj)`. That call 
probably costs way more than the lock itself, though. Would be nice to have a 
debug mode, to activate it there.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-2098) Checkpoint barrier initiation at source is not aligned with snapshotting


[ 
https://issues.apache.org/jira/browse/FLINK-2098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571725#comment-14571725
 ] 

ASF GitHub Bot commented on FLINK-2098:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/742#issuecomment-108627853
  
I was thinking the same thing, about `Thread.holdsLock(obj)`. That call 
probably costs way more than the lock itself, though. Would be nice to have a 
debug mode, to activate it there.


 Checkpoint barrier initiation at source is not aligned with snapshotting
 

 Key: FLINK-2098
 URL: https://issues.apache.org/jira/browse/FLINK-2098
 Project: Flink
  Issue Type: Bug
  Components: Streaming
Affects Versions: 0.9
Reporter: Stephan Ewen
Assignee: Aljoscha Krettek
Priority: Blocker
 Fix For: 0.9


 The stream source does not properly align the emission of checkpoint barriers 
 with the drawing of snapshots.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-2134] Close Netty channel via CloseRequ...

2015-06-03 Thread uce

Github user uce commented on the pull request:

https://github.com/apache/flink/pull/773#issuecomment-108628742
  
No, there are no backwards events *after* the channel is closed. The sync 
sends out the backwards events, then closes. But the close could overtake the 
unflushed backwards events. This lead to a deadlock, because the head was 
waiting on termination events, which never arrived.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-1319) Add static code analysis for UDFs


[ 
https://issues.apache.org/jira/browse/FLINK-1319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571743#comment-14571743
 ] 

ASF GitHub Bot commented on FLINK-1319:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/729#issuecomment-108630545
  
Great review, Ufuk.

I agree with @uce and @rmetzger to add a comment how to disable it (in 
case).


 Add static code analysis for UDFs
 -

 Key: FLINK-1319
 URL: https://issues.apache.org/jira/browse/FLINK-1319
 Project: Flink
  Issue Type: New Feature
  Components: Java API, Scala API
Reporter: Stephan Ewen
Assignee: Timo Walther
Priority: Minor

 Flink's Optimizer takes information that tells it for UDFs which fields of 
 the input elements are accessed, modified, or frwarded/copied. This 
 information frequently helps to reuse partitionings, sorts, etc. It may speed 
 up programs significantly, as it can frequently eliminate sorts and shuffles, 
 which are costly.
 Right now, users can add lightweight annotations to UDFs to provide this 
 information (such as adding {{@ConstandFields(0-3, 1, 2-1)}}.
 We worked with static code analysis of UDFs before, to determine this 
 information automatically. This is an incredible feature, as it magically 
 makes programs faster.
 For record-at-a-time operations (Map, Reduce, FlatMap, Join, Cross), this 
 works surprisingly well in many cases. We used the Soot toolkit for the 
 static code analysis. Unfortunately, Soot is LGPL licensed and thus we did 
 not include any of the code so far.
 I propose to add this functionality to Flink, in the form of a drop-in 
 addition, to work around the LGPL incompatibility with ALS 2.0. Users could 
 simply download a special flink-code-analysis.jar and drop it into the 
 lib folder to enable this functionality. We may even add a script to 
 tools that downloads that library automatically into the lib folder. This 
 should be legally fine, since we do not redistribute LGPL code and only 
 dynamically link it (the incompatibility with ASL 2.0 is mainly in the 
 patentability, if I remember correctly).
 Prior work on this has been done by [~aljoscha] and [~skunert], which could 
 provide a code base to start with.
 *Appendix*
 Hompage to Soot static analysis toolkit: http://www.sable.mcgill.ca/soot/
 Papers on static analysis and for optimization: 
 http://stratosphere.eu/assets/papers/EnablingOperatorReorderingSCA_12.pdf and 
 http://stratosphere.eu/assets/papers/openingTheBlackBoxes_12.pdf
 Quick introduction to the Optimizer: 
 http://stratosphere.eu/assets/papers/2014-VLDBJ_Stratosphere_Overview.pdf 
 (Section 6)
 Optimizer for Iterations: 
 http://stratosphere.eu/assets/papers/spinningFastIterativeDataFlows_12.pdf 
 (Sections 4.3 and 5.3)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-1319][core] Add static code analysis fo...

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/729#issuecomment-108630545
  
Great review, Ufuk.

I agree with @uce and @rmetzger to add a comment how to disable it (in 
case).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2134] Close Netty channel via CloseRequ...

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/773#issuecomment-108630661
  
Ah, it is a close from the receiver end, got it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: Hits

Github user andralungu commented on the pull request:

https://github.com/apache/flink/pull/765#issuecomment-108634942
  
Hi @mfahimazizi, 

The common practice is to prefix the commit with the Jira issue, e.g. 
[FLINK-20XX][gelly]. Also, you should play with your IDE's settings. Right now, 
indentation is performed with spaces and it should be with tabs. This is why 
Travis is failing. To make sure the code works do a cd 
flink-staging/flink-gelly and then mvn verify. A success there equals a travis 
success 98% of the time :)

After implementing an algorithm, it's always good to check whether it works 
correctly by writing a test (check the test/example/ folder :) ). 
I would also update the documentation here. In the Library method section 
add an entry *HITS, perhaps with the full name of the acronym as well. 

I left some comments in-line. Overall this does not look bad at all!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2098] Ensure checkpoints and element em...

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/742#issuecomment-108618014
  
I generally like that idea. Especially the extensibility with respect to 
timestamps and watermark generation is a good point.

Retrieving the lock object from the context is not very obvious, but then 
again, someone who implements a fault tolerant exactly-once source should ready 
the javadocs and have a look at an example.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-2098) Checkpoint barrier initiation at source is not aligned with snapshotting


[ 
https://issues.apache.org/jira/browse/FLINK-2098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571665#comment-14571665
 ] 

ASF GitHub Bot commented on FLINK-2098:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/742#issuecomment-108618236
  
How about we still merge this now, to make sure we have a good version in 
to start testing? The change you propose is API only, and would not affect 
internals/timings...


 Checkpoint barrier initiation at source is not aligned with snapshotting
 

 Key: FLINK-2098
 URL: https://issues.apache.org/jira/browse/FLINK-2098
 Project: Flink
  Issue Type: Bug
  Components: Streaming
Affects Versions: 0.9
Reporter: Stephan Ewen
Assignee: Aljoscha Krettek
Priority: Blocker
 Fix For: 0.9


 The stream source does not properly align the emission of checkpoint barriers 
 with the drawing of snapshots.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-2098) Checkpoint barrier initiation at source is not aligned with snapshotting


[ 
https://issues.apache.org/jira/browse/FLINK-2098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571661#comment-14571661
 ] 

ASF GitHub Bot commented on FLINK-2098:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/742#issuecomment-108618014
  
I generally like that idea. Especially the extensibility with respect to 
timestamps and watermark generation is a good point.

Retrieving the lock object from the context is not very obvious, but then 
again, someone who implements a fault tolerant exactly-once source should ready 
the javadocs and have a look at an example.




 Checkpoint barrier initiation at source is not aligned with snapshotting
 

 Key: FLINK-2098
 URL: https://issues.apache.org/jira/browse/FLINK-2098
 Project: Flink
  Issue Type: Bug
  Components: Streaming
Affects Versions: 0.9
Reporter: Stephan Ewen
Assignee: Aljoscha Krettek
Priority: Blocker
 Fix For: 0.9


 The stream source does not properly align the emission of checkpoint barriers 
 with the drawing of snapshots.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-1967) Introduce (Event)time in Streaming


[ 
https://issues.apache.org/jira/browse/FLINK-1967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571677#comment-14571677
 ] 

Stephan Ewen commented on FLINK-1967:
-

With next version, you mean 0.10, right?

I am still in favor. Making all windows data driven (timestamps are data) 
should help simplify the implementation as well.


 Introduce (Event)time in Streaming
 --

 Key: FLINK-1967
 URL: https://issues.apache.org/jira/browse/FLINK-1967
 Project: Flink
  Issue Type: Improvement
Reporter: Aljoscha Krettek
Assignee: Aljoscha Krettek

 This requires introducing a timestamp in streaming record and a change in the 
 sources to add timestamps to records. This will also introduce punctuations 
 (or low watermarks) to allow windows to work correctly on unordered, 
 timestamped input data. In the process of this, the windowing subsystem also 
 needs to be adapted to use the punctuations. Furthermore, all operators need 
 to be made aware of punctuations and correctly forward them. Then, a new 
 operator must be introduced to to allow modification of timestamps.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: Hits

Github user andralungu commented on a diff in the pull request:

https://github.com/apache/flink/pull/765#discussion_r31678387
  
--- Diff: 
flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/example/HITSExample.java
 ---
@@ -0,0 +1,131 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * License); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.graph.example;
+
+import org.apache.flink.api.common.ProgramDescription;
+import org.apache.flink.api.common.functions.FlatMapFunction;
+import org.apache.flink.api.common.functions.MapFunction;
+import org.apache.flink.api.java.DataSet;
+import org.apache.flink.api.java.ExecutionEnvironment;
+import org.apache.flink.graph.Edge;
+import org.apache.flink.graph.Graph;
+import org.apache.flink.graph.Vertex;
+import org.apache.flink.graph.library.HITS;
+import org.apache.flink.graph.utils.Hits;
+import org.apache.flink.graph.utils.Tuple3ToEdgeMap;
+import org.apache.flink.util.Collector;
+
+/**
+ * This program is an example for HITS algorithm.
+ * the result is either a hub value or authority value base user selection.
+ *
+ * If no arguments are provided, the example runs with a random graph of 
10 vertices
+ * and random edge weights.
+ */
+
+public class HITSExample implements ProgramDescription {
+@SuppressWarnings(serial)
+public static void main(String[] args) throws Exception {
+
+if(!parseParameters(args)) {
+return;
+}
+
+ExecutionEnvironment env = 
ExecutionEnvironment.getExecutionEnvironment();
+DataSetEdgeLong, String links = getEdgesDataSet(env);
+
+GraphLong, Double, String network = Graph.fromDataSet(links, new 
MapFunctionLong, Double() {
+
+public Double map(Long value) throws Exception {
+return 1.0;
+}
+}, env);
+
+// add  graph to HITS class with iteration value and hub or authority 
enum value.
+DataSetVertexLong, Double HitsValue =network.run(
+new 
HITSLong(Hits.AUTHORITY,maxIterations)).getVertices();
+
+if (fileOutput) {
+HitsValue.writeAsCsv(outputPath, \n, \t);
+} else {
+HitsValue.print();
+}
+//env.execute(HITS algorithm);
+}
+
+@Override
+public String getDescription() {
+return HITS algorithm example;
+}
+
+// 
*
+// UTIL METHODS
+// 
*
+
+private static boolean fileOutput = false;
+private static long numPages = 10;
+private static String edgeInputPath = null;
+private static String outputPath = null;
+private static int maxIterations = 5;
+
+private static boolean parseParameters(String[] args) {
+
+if(args.length  0) {
+if(args.length != 3) {
+System.err.println(Usage: PageRank input edges path 
output path num iterations);
--- End diff --

It's no longer Page Rank ;) let's not confuse people!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: Hits

Github user andralungu commented on a diff in the pull request:

https://github.com/apache/flink/pull/765#discussion_r31678355
  
--- Diff: 
flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/example/HITSExample.java
 ---
@@ -0,0 +1,131 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * License); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.graph.example;
+
+import org.apache.flink.api.common.ProgramDescription;
+import org.apache.flink.api.common.functions.FlatMapFunction;
+import org.apache.flink.api.common.functions.MapFunction;
+import org.apache.flink.api.java.DataSet;
+import org.apache.flink.api.java.ExecutionEnvironment;
+import org.apache.flink.graph.Edge;
+import org.apache.flink.graph.Graph;
+import org.apache.flink.graph.Vertex;
+import org.apache.flink.graph.library.HITS;
+import org.apache.flink.graph.utils.Hits;
+import org.apache.flink.graph.utils.Tuple3ToEdgeMap;
+import org.apache.flink.util.Collector;
+
+/**
+ * This program is an example for HITS algorithm.
+ * the result is either a hub value or authority value base user selection.
+ *
+ * If no arguments are provided, the example runs with a random graph of 
10 vertices
+ * and random edge weights.
+ */
+
+public class HITSExample implements ProgramDescription {
+@SuppressWarnings(serial)
+public static void main(String[] args) throws Exception {
+
+if(!parseParameters(args)) {
+return;
+}
+
+ExecutionEnvironment env = 
ExecutionEnvironment.getExecutionEnvironment();
+DataSetEdgeLong, String links = getEdgesDataSet(env);
+
+GraphLong, Double, String network = Graph.fromDataSet(links, new 
MapFunctionLong, Double() {
+
+public Double map(Long value) throws Exception {
+return 1.0;
+}
+}, env);
+
+// add  graph to HITS class with iteration value and hub or authority 
enum value.
+DataSetVertexLong, Double HitsValue =network.run(
+new 
HITSLong(Hits.AUTHORITY,maxIterations)).getVertices();
+
+if (fileOutput) {
+HitsValue.writeAsCsv(outputPath, \n, \t);
+} else {
+HitsValue.print();
+}
+//env.execute(HITS algorithm);
--- End diff --

The env.execute() caused problems for you because you were only testing 
print(). You don't need to do an env.execute() on the else branch as print() 
now has the same status as count() and collect(), but this command is still 
needed on the then branch(i.e. after writeAsCsv). 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-1528) Add local clustering coefficient library method and example


[ 
https://issues.apache.org/jira/browse/FLINK-1528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14570573#comment-14570573
 ] 

ASF GitHub Bot commented on FLINK-1528:
---

Github user balidani commented on the pull request:

https://github.com/apache/flink/pull/420#issuecomment-108284218
  
Yeah, I should definitely finish this! I'll take a look tonight, sorry 
about that :)


 Add local clustering coefficient library method and example
 ---

 Key: FLINK-1528
 URL: https://issues.apache.org/jira/browse/FLINK-1528
 Project: Flink
  Issue Type: Task
  Components: Gelly
Reporter: Vasia Kalavri
Assignee: Daniel Bali

 Add a gelly library method and example to compute the local clustering 
 coefficient.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-1528][Gelly] Added Local Clustering Coe...

2015-06-03 Thread balidani

Github user balidani commented on the pull request:

https://github.com/apache/flink/pull/420#issuecomment-108284218
  
Yeah, I should definitely finish this! I'll take a look tonight, sorry 
about that :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [contrib] Storm compatibility

2015-06-03 Thread szape

Github user szape commented on the pull request:

https://github.com/apache/flink/pull/764#issuecomment-108284180
  
The first 10 commits are not rebased on the current master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-1520) Read edges and vertices from CSV files


[ 
https://issues.apache.org/jira/browse/FLINK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14570586#comment-14570586
 ] 

Vasia Kalavri commented on FLINK-1520:
--

Hey [~cebe]! One more ping to you :)
If you're not working on this, can I release this issue? Thanks!

 Read edges and vertices from CSV files
 --

 Key: FLINK-1520
 URL: https://issues.apache.org/jira/browse/FLINK-1520
 Project: Flink
  Issue Type: New Feature
  Components: Gelly
Reporter: Vasia Kalavri
Assignee: Carsten Brandt
Priority: Minor
  Labels: easyfix, newbie

 Add methods to create Vertex and Edge Datasets directly from CSV file inputs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (FLINK-2147) Approximate calculation of frequencies in data streams


 [ 
https://issues.apache.org/jira/browse/FLINK-2147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabor Gevay updated FLINK-2147:
---
Labels: statistics  (was: )

 Approximate calculation of frequencies in data streams
 --

 Key: FLINK-2147
 URL: https://issues.apache.org/jira/browse/FLINK-2147
 Project: Flink
  Issue Type: Sub-task
  Components: Streaming
Reporter: Gabor Gevay
Priority: Minor
  Labels: statistics

 Count-Min sketch is a hashing-based algorithm for approximately keeping track 
 of the frequencies of elements in a data stream. It is described by Cormode 
 et al. in the following paper:
 http://dimacs.rutgers.edu/~graham/pubs/papers/cmsoft.pdf
 Note that this algorithm can be conveniently implemented in a distributed 
 way, as described in section 3.2 of the paper.
 The paper
 http://www.vldb.org/conf/2002/S10P03.pdf
 also describes algorithms for approximately keeping track of frequencies, but 
 here the user can specify a threshold below which she is not interested in 
 the frequency of an element. The error-bounds are also different than the 
 Count-min sketch algorithm.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (FLINK-2148) Approximately calculate the number of distinct elements of a stream

Gabor Gevay created FLINK-2148:
--

 Summary: Approximately calculate the number of distinct elements 
of a stream
 Key: FLINK-2148
 URL: https://issues.apache.org/jira/browse/FLINK-2148
 Project: Flink
  Issue Type: Sub-task
Reporter: Gabor Gevay
Priority: Minor


In the paper
http://people.seas.harvard.edu/~minilek/papers/f0.pdf
Kane et al. describes an optimal algorithm for estimating the number of 
distinct elements in a data stream.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (FLINK-2150) Add a library method that assigns unique Long values to vertices

Vasia Kalavri created FLINK-2150:


 Summary: Add a library method that assigns unique Long values to 
vertices
 Key: FLINK-2150
 URL: https://issues.apache.org/jira/browse/FLINK-2150
 Project: Flink
  Issue Type: New Feature
  Components: Gelly
Reporter: Vasia Kalavri
Priority: Minor


In some graph algorithms, it is required to initialize the vertex values with 
unique values (e.g. label propagation).
This issue proposes adding a Gelly library method that receives an input graph 
and initializes its vertex values with unique Long values.
This method can then also be used to improve the MusicProfiles example.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (FLINK-2145) Median calculation for windows


 [ 
https://issues.apache.org/jira/browse/FLINK-2145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabor Gevay updated FLINK-2145:
---
Labels: statistics  (was: )

 Median calculation for windows
 --

 Key: FLINK-2145
 URL: https://issues.apache.org/jira/browse/FLINK-2145
 Project: Flink
  Issue Type: Sub-task
  Components: Streaming
Reporter: Gabor Gevay
Assignee: Gabor Gevay
Priority: Minor
  Labels: statistics

 The PreReducer for this has the following algorithm: We maintain two 
 multisets (as, for example, balanced binary search trees), that always 
 partition the elements of the current window to smaller-than-median and 
 larger-than-median elements. At each store and evict, we can maintain this 
 invariant with only O(1) multiset operations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (FLINK-2145) Median calculation for windows

Gabor Gevay created FLINK-2145:
--

 Summary: Median calculation for windows
 Key: FLINK-2145
 URL: https://issues.apache.org/jira/browse/FLINK-2145
 Project: Flink
  Issue Type: Sub-task
  Components: Streaming
Reporter: Gabor Gevay
Assignee: Gabor Gevay
Priority: Minor


The PreReducer for this has the following algorithm: We maintain two multisets 
(as, for example, balanced binary search trees), that always partition the 
elements of the current window to smaller-than-median and larger-than-median 
elements. At each store and evict, we can maintain this invariant with only 
O(1) multiset operations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (FLINK-2144) Implement count, average, and variance for windows


 [ 
https://issues.apache.org/jira/browse/FLINK-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabor Gevay updated FLINK-2144:
---
Labels: statistics  (was: )

 Implement count, average, and variance for windows
 --

 Key: FLINK-2144
 URL: https://issues.apache.org/jira/browse/FLINK-2144
 Project: Flink
  Issue Type: Sub-task
  Components: Streaming
Reporter: Gabor Gevay
Assignee: Gabor Gevay
Priority: Minor
  Labels: statistics

 By count I mean the number of elements in the window.
 These can be implemented very efficiently building on FLINK-2143.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (FLINK-2144) Implement count, average, and variance for windows


 [ 
https://issues.apache.org/jira/browse/FLINK-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabor Gevay updated FLINK-2144:
---
Description: 
By count I mean the number of elements in the window.

These can be implemented very efficiently building on FLINK-2143:
Store: O(1)
Evict: O(1)
emitWindow: O(1)
memory: O(1)

  was:
By count I mean the number of elements in the window.

These can be implemented very efficiently building on FLINK-2143.


 Implement count, average, and variance for windows
 --

 Key: FLINK-2144
 URL: https://issues.apache.org/jira/browse/FLINK-2144
 Project: Flink
  Issue Type: Sub-task
  Components: Streaming
Reporter: Gabor Gevay
Assignee: Gabor Gevay
Priority: Minor
  Labels: statistics

 By count I mean the number of elements in the window.
 These can be implemented very efficiently building on FLINK-2143:
 Store: O(1)
 Evict: O(1)
 emitWindow: O(1)
 memory: O(1)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-1707) Add an Affinity Propagation Library Method


[ 
https://issues.apache.org/jira/browse/FLINK-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14570572#comment-14570572
 ] 

ASF GitHub Bot commented on FLINK-1707:
---

Github user vasia commented on the pull request:

https://github.com/apache/flink/pull/649#issuecomment-108284020
  
Hey @joey001!

Are you still working on this? Let us know if you need help!


 Add an Affinity Propagation Library Method
 --

 Key: FLINK-1707
 URL: https://issues.apache.org/jira/browse/FLINK-1707
 Project: Flink
  Issue Type: New Feature
  Components: Gelly
Reporter: Vasia Kalavri
Assignee: joey
Priority: Minor

 This issue proposes adding the an implementation of the Affinity Propagation 
 algorithm as a Gelly library method and a corresponding example.
 The algorithm is described in paper [1] and a description of a vertex-centric 
 implementation can be found is [2].
 [1]: http://www.psi.toronto.edu/affinitypropagation/FreyDueckScience07.pdf
 [2]: http://event.cwi.nl/grades2014/00-ching-slides.pdf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (FLINK-2149) Simplify Gelly Jaccard similarity example

Vasia Kalavri created FLINK-2149:


 Summary: Simplify Gelly Jaccard similarity example
 Key: FLINK-2149
 URL: https://issues.apache.org/jira/browse/FLINK-2149
 Project: Flink
  Issue Type: Improvement
  Components: Gelly
Affects Versions: 0.9
Reporter: Vasia Kalavri
Priority: Trivial


The Gelly Jaccard similarity example can be simplified by replacing the 
groupReduceOnEdges method with the simpler reduceOnEdges.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (FLINK-1759) Execution statistics for vertex-centric iterations


 [ 
https://issues.apache.org/jira/browse/FLINK-1759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vasia Kalavri updated FLINK-1759:
-
Labels:   (was: easyfix starter)

 Execution statistics for vertex-centric iterations
 --

 Key: FLINK-1759
 URL: https://issues.apache.org/jira/browse/FLINK-1759
 Project: Flink
  Issue Type: Improvement
  Components: Gelly
Affects Versions: 0.9
Reporter: Vasia Kalavri
Priority: Minor

 It would be nice to add an option for gathering execution statistics from 
 VertexCentricIteration.
 In particular, the following metrics could be useful:
 - total number of supersteps
 - number of messages sent (total / per superstep)
 - bytes of messages exchanged (total / per superstep)
 - execution time (total / per superstep)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-1731] [ml] Implementation of Feature K-...

2015-06-03 Thread FGoessler

Github user FGoessler commented on the pull request:

https://github.com/apache/flink/pull/700#issuecomment-108310843
  
The travis build is failing on Oracle JDK 8. Maven or Flink are hanging 
according to the build log. Can anyone help or at least restart the build? 
Are there any known flipping tests? Imo the failure isn't related to our 
changes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-1731) Add kMeans clustering algorithm to machine learning library

[
https://issues.apache.org/jira/browse/FLINK-1731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14570676#comment-14570676
]

ASF GitHub Bot commented on FLINK-1731:
---

Github user FGoessler commented on the pull request:

https://github.com/apache/flink/pull/700#issuecomment-108310843

The travis build is failing on Oracle JDK 8. Maven or Flink are hanging
according to the build log. Can anyone help or at least restart the build?
Are there any known flipping tests? Imo the failure isn't related to our
changes.

Add kMeans clustering algorithm to machine learning library
---

Key: FLINK-1731
URL: https://issues.apache.org/jira/browse/FLINK-1731
Project: Flink
Issue Type: New Feature
Components: Machine Learning Library
Reporter: Till Rohrmann
Assignee: Peter Schrott
Labels: ML

The Flink repository already contains a kMeans implementation but it is not
yet ported to the machine learning library. I assume that only the used data
types have to be adapted and then it can be more or less directly moved to
flink-ml.
The kMeans++ [1] and the kMeans|| [2] algorithm constitute a better
implementation because the improve the initial seeding phase to achieve near
optimal clustering. It might be worthwhile to implement kMeans||.
Resources:
[1] http://ilpubs.stanford.edu:8090/778/1/2006-13.pdf
[2] http://theory.stanford.edu/~sergei/papers/vldb12-kmpar.pdf

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-1707][WIP]Add an Affinity Propagation L...

Github user vasia commented on the pull request:

https://github.com/apache/flink/pull/649#issuecomment-108284020
  
Hey @joey001!

Are you still working on this? Let us know if you need help!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [contrib] Storm compatibility

2015-06-03 Thread mjsax

Github user mjsax commented on the pull request:

https://github.com/apache/flink/pull/764#issuecomment-108292671
  
I thought this is a clean branch. I am working on this currently... 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-1993] [ml] Replaces custom SGD in Multi...

Github user thvasilo commented on a diff in the pull request:

https://github.com/apache/flink/pull/760#discussion_r31604864
  
--- Diff: 
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/regression/MultipleLinearRegression.scala
 ---
@@ -309,8 +207,10 @@ object MultipleLinearRegression {
   : DataSet[LabeledVector] = {
--- End diff --

Docstring for the return type?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-1993) Replace MultipleLinearRegression's custom SGD with optimization framework's SGD


[ 
https://issues.apache.org/jira/browse/FLINK-1993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14570509#comment-14570509
 ] 

ASF GitHub Bot commented on FLINK-1993:
---

Github user thvasilo commented on a diff in the pull request:

https://github.com/apache/flink/pull/760#discussion_r31604864
  
--- Diff: 
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/regression/MultipleLinearRegression.scala
 ---
@@ -309,8 +207,10 @@ object MultipleLinearRegression {
   : DataSet[LabeledVector] = {
--- End diff --

Docstring for the return type?


 Replace MultipleLinearRegression's custom SGD with optimization framework's 
 SGD
 ---

 Key: FLINK-1993
 URL: https://issues.apache.org/jira/browse/FLINK-1993
 Project: Flink
  Issue Type: Task
  Components: Machine Learning Library
Reporter: Till Rohrmann
Assignee: Theodore Vasiloudis
Priority: Minor
  Labels: ML
 Fix For: 0.9


 The current implementation of MultipleLinearRegression uses a custom SGD 
 implementation. Flink's optimization framework also contains a SGD optimizer 
 which should replace the custom implementation once the framework is merged.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-2137) Expose partitionByHash for WindowedDataStream

2015-06-03 Thread JIRA


[ 
https://issues.apache.org/jira/browse/FLINK-2137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14570528#comment-14570528
 ] 

Márton Balassi commented on FLINK-2137:
---

I am personally fine with not having it, if no objections please mark it as not 
a problem.

 Expose partitionByHash for WindowedDataStream
 -

 Key: FLINK-2137
 URL: https://issues.apache.org/jira/browse/FLINK-2137
 Project: Flink
  Issue Type: New Feature
  Components: Streaming
Affects Versions: 0.9
Reporter: Márton Balassi
Assignee: Gábor Hermann

 This functionality has been recently exposed for DataStreams and 
 ConnectedDataStreams, but not for WindowedDataStreams yet.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (FLINK-2136) Test the streaming scala API

2015-06-03 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/FLINK-2136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gábor Hermann reassigned FLINK-2136:


Assignee: Gábor Hermann

 Test the streaming scala API
 

 Key: FLINK-2136
 URL: https://issues.apache.org/jira/browse/FLINK-2136
 Project: Flink
  Issue Type: Test
  Components: Scala API, Streaming
Affects Versions: 0.9
Reporter: Márton Balassi
Assignee: Gábor Hermann

 There are no test covering the streaming scala API. I would suggest to test 
 whether the StreamGraph created by a certain operation looks as expected. 
 Deeper layers and runtime should not be tested here, that is done in 
 streaming-core.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-1526) Add Minimum Spanning Tree library method and example


[ 
https://issues.apache.org/jira/browse/FLINK-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14570554#comment-14570554
 ] 

ASF GitHub Bot commented on FLINK-1526:
---

Github user vasia commented on the pull request:

https://github.com/apache/flink/pull/434#issuecomment-108278253
  
Hey @andralungu!

I think we should close this one. We can't really continue from this state 
anyway. I guess we'll have to revisit this problem once we have for-loop 
iteration support.


 Add Minimum Spanning Tree library method and example
 

 Key: FLINK-1526
 URL: https://issues.apache.org/jira/browse/FLINK-1526
 Project: Flink
  Issue Type: Task
  Components: Gelly
Reporter: Vasia Kalavri
Assignee: Andra Lungu

 This issue proposes the addition of a library method and an example for 
 distributed minimum spanning tree in Gelly.
 The DMST algorithm is very interesting because it is quite different from 
 PageRank-like iterative graph algorithms. It consists of distinct phases 
 inside the same iteration and requires a mechanism to detect convergence of 
 one phase to proceed to the next one. Current implementations in 
 vertex-centric models are quite long (1000 lines) and hard to understand.
 You can find a description of the algorithm [here | 
 http://ilpubs.stanford.edu:8090/1077/3/p535-salihoglu.pdf] and [here | 
 http://www.vldb.org/pvldb/vol7/p1047-han.pdf].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (FLINK-2146) Fast calculation of min/max with arbitrary eviction and triggers

Gabor Gevay created FLINK-2146:
--

 Summary: Fast calculation of min/max with arbitrary eviction and 
triggers
 Key: FLINK-2146
 URL: https://issues.apache.org/jira/browse/FLINK-2146
 Project: Flink
  Issue Type: Sub-task
Reporter: Gabor Gevay
Priority: Minor


The last algorithm described here could be used:
http://codercareer.blogspot.com/2012/02/no-33-maximums-in-sliding-windows.html
It is based on a double-ended queue which maintains a sorted list of elements 
of the current window that have the possibility of being the maximal element in 
the future.

Store: O(1) amortized
Evict: O(1)
emitWindow: O(1)
memory: O(N)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-1962) Add Gelly Scala API

2015-06-03 Thread PJ Van Aeken (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-1962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14570601#comment-14570601
 ] 

PJ Van Aeken commented on FLINK-1962:
-

[~ssc], you can find an implementation to play with in my fork (branch 
scala-gelly-api). It has all of the functionalities from the Java API except 
for a few utility methods for creating graphs, and I am also still working on 
Vertex Centric Iterations and Gather Sum Apply Iterations. Other than that most 
of it should be there, although I am a couple commits behind.

 Add Gelly Scala API
 ---

 Key: FLINK-1962
 URL: https://issues.apache.org/jira/browse/FLINK-1962
 Project: Flink
  Issue Type: Task
  Components: Gelly, Scala API
Affects Versions: 0.9
Reporter: Vasia Kalavri
Assignee: PJ Van Aeken





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: Implemented TwitterSourceFilter and adapted Tw...

Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/695#issuecomment-108290010
  
Thanks for your work. :smile: 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-2048) Enhance Twitter Stream support


[ 
https://issues.apache.org/jira/browse/FLINK-2048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14570605#comment-14570605
 ] 

ASF GitHub Bot commented on FLINK-2048:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/695


 Enhance Twitter Stream support
 --

 Key: FLINK-2048
 URL: https://issues.apache.org/jira/browse/FLINK-2048
 Project: Flink
  Issue Type: Task
  Components: Streaming
Affects Versions: master
Reporter: Hilmi Yildirim
Assignee: Hilmi Yildirim
   Original Estimate: 2h
  Remaining Estimate: 2h

 Flink does not have a real twitter support. It only has a TwitterSource which 
 uses a sample stream which can not be used properly for analysis. It is 
 possible to use external tools to create streams (e.g. Kafka) but it is 
 beneficially to create a propert twitter stream in Flink.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: Implemented TwitterSourceFilter and adapted Tw...

2015-06-03 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/695


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Updated] (FLINK-1759) Execution statistics for vertex-centric iterations


 [ 
https://issues.apache.org/jira/browse/FLINK-1759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vasia Kalavri updated FLINK-1759:
-
Labels: easyfix starter  (was: )

 Execution statistics for vertex-centric iterations
 --

 Key: FLINK-1759
 URL: https://issues.apache.org/jira/browse/FLINK-1759
 Project: Flink
  Issue Type: Improvement
  Components: Gelly
Affects Versions: 0.9
Reporter: Vasia Kalavri
Priority: Minor
  Labels: easyfix, starter

 It would be nice to add an option for gathering execution statistics from 
 VertexCentricIteration.
 In particular, the following metrics could be useful:
 - total number of supersteps
 - number of messages sent (total / per superstep)
 - bytes of messages exchanged (total / per superstep)
 - execution time (total / per superstep)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [contrib] Storm compatibility

Github user mbalassi commented on the pull request:

https://github.com/apache/flink/pull/764#issuecomment-108292481
  
You can just ad one commit at the end, please do not rewrite the complete 
history for this. That is unnecessary overhead. If you feel confident about it 
we can even put it in the 0.9 release.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-1528) Add local clustering coefficient library method and example


[ 
https://issues.apache.org/jira/browse/FLINK-1528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14570567#comment-14570567
 ] 

ASF GitHub Bot commented on FLINK-1528:
---

Github user vasia commented on the pull request:

https://github.com/apache/flink/pull/420#issuecomment-108283573
  
Hey @balidani!
Would you like to finish this up?
It's not really urgent, but it's almost finished and it'd be a pity to 
abandon :)
Someone else could also take over of course. Just let us know!


 Add local clustering coefficient library method and example
 ---

 Key: FLINK-1528
 URL: https://issues.apache.org/jira/browse/FLINK-1528
 Project: Flink
  Issue Type: Task
  Components: Gelly
Reporter: Vasia Kalavri
Assignee: Daniel Bali

 Add a gelly library method and example to compute the local clustering 
 coefficient.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-1962) Add Gelly Scala API


[ 
https://issues.apache.org/jira/browse/FLINK-1962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14570607#comment-14570607
 ] 

Vasia Kalavri commented on FLINK-1962:
--

Awesome news [~vanaepi]!

 Add Gelly Scala API
 ---

 Key: FLINK-1962
 URL: https://issues.apache.org/jira/browse/FLINK-1962
 Project: Flink
  Issue Type: Task
  Components: Gelly, Scala API
Affects Versions: 0.9
Reporter: Vasia Kalavri
Assignee: PJ Van Aeken





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [streaming] Consolidate streaming API method n...

Github user mbalassi commented on the pull request:

https://github.com/apache/flink/pull/761#issuecomment-108295020
  
Merging.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: Hits

2015-06-03 Thread hsaputra

Github user hsaputra commented on a diff in the pull request:

https://github.com/apache/flink/pull/765#discussion_r31596533
  
--- Diff: 
flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/utils/Hits.java 
---
@@ -0,0 +1,14 @@
+package org.apache.flink.graph.utils;
--- End diff --

Missing Apache license header


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: Hits

2015-06-03 Thread hsaputra

Github user hsaputra commented on a diff in the pull request:

https://github.com/apache/flink/pull/765#discussion_r31596508
  
--- Diff: 
flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/example/HITSExample.java
 ---
@@ -0,0 +1,113 @@
+package org.apache.flink.graph.example;
--- End diff --

Missing Apache license header


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-1993] [ml] Replaces custom SGD in Multi...

Github user thvasilo commented on a diff in the pull request:

https://github.com/apache/flink/pull/760#discussion_r31604529
  
--- Diff: 
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/regression/MultipleLinearRegression.scala
 ---
@@ -87,11 +89,11 @@ import org.apache.flink.ml.pipeline.{FitOperation, 
PredictOperation, Predictor}
   *
   */
 class MultipleLinearRegression extends Predictor[MultipleLinearRegression] 
{
-
+  import org.apache.flink.ml._
--- End diff --

Line 49 typo: iteratinos - iterations


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-2130) RabbitMQ source does not fail when failing to retrieve elements


[ 
https://issues.apache.org/jira/browse/FLINK-2130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14570522#comment-14570522
 ] 

ASF GitHub Bot commented on FLINK-2130:
---

GitHub user mbalassi opened a pull request:

https://github.com/apache/flink/pull/767

[FLINK-2130] [streaming] RMQ Source properly propagates exceptions



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mbalassi/flink flink-2130

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/767.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #767


commit 2b784f09493767ca5b6388ac692406466dc55575
Author: mbalassi mbala...@apache.org
Date:   2015-06-03T09:16:48Z

[FLINK-2130] [streaming] RMQ Source properly propagates exceptions




 RabbitMQ source does not fail when failing to retrieve elements
 ---

 Key: FLINK-2130
 URL: https://issues.apache.org/jira/browse/FLINK-2130
 Project: Flink
  Issue Type: Bug
  Components: Streaming, Streaming Connectors
Reporter: Stephan Ewen
Assignee: Márton Balassi

 The RMQ source only logs when elements cannot be retrieved. Failures are not 
 propagated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-2137) Expose partitionByHash for WindowedDataStream

2015-06-03 Thread Gyula Fora (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-2137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14570525#comment-14570525
 ] 

Gyula Fora commented on FLINK-2137:
---

I dont think this makes too much sense for the windowing case. The groupBy with 
keyselector should be enough.

 Expose partitionByHash for WindowedDataStream
 -

 Key: FLINK-2137
 URL: https://issues.apache.org/jira/browse/FLINK-2137
 Project: Flink
  Issue Type: New Feature
  Components: Streaming
Affects Versions: 0.9
Reporter: Márton Balassi
Assignee: Gábor Hermann

 This functionality has been recently exposed for DataStreams and 
 ConnectedDataStreams, but not for WindowedDataStreams yet.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-1319) Add static code analysis for UDFs


[ 
https://issues.apache.org/jira/browse/FLINK-1319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14570367#comment-14570367
 ] 

ASF GitHub Bot commented on FLINK-1319:
---

Github user twalthr commented on a diff in the pull request:

https://github.com/apache/flink/pull/729#discussion_r31597259
  
--- Diff: 
flink-java/src/main/java/org/apache/flink/api/java/operators/SingleInputUdfOperator.java
 ---
@@ -54,8 +54,11 @@
 
private MapString, DataSet? broadcastVariables;
 
+   // NOTE: only set this variable via setSemanticProperties()
--- End diff --

Manual annotations should always trump optimizer annotations. The analyzer 
can not determine all semantic properties. E.g. when using KeySelectors. The 
user should still have the possibility to override semantic properties to add 
more properties.


 Add static code analysis for UDFs
 -

 Key: FLINK-1319
 URL: https://issues.apache.org/jira/browse/FLINK-1319
 Project: Flink
  Issue Type: New Feature
  Components: Java API, Scala API
Reporter: Stephan Ewen
Assignee: Timo Walther
Priority: Minor

 Flink's Optimizer takes information that tells it for UDFs which fields of 
 the input elements are accessed, modified, or frwarded/copied. This 
 information frequently helps to reuse partitionings, sorts, etc. It may speed 
 up programs significantly, as it can frequently eliminate sorts and shuffles, 
 which are costly.
 Right now, users can add lightweight annotations to UDFs to provide this 
 information (such as adding {{@ConstandFields(0-3, 1, 2-1)}}.
 We worked with static code analysis of UDFs before, to determine this 
 information automatically. This is an incredible feature, as it magically 
 makes programs faster.
 For record-at-a-time operations (Map, Reduce, FlatMap, Join, Cross), this 
 works surprisingly well in many cases. We used the Soot toolkit for the 
 static code analysis. Unfortunately, Soot is LGPL licensed and thus we did 
 not include any of the code so far.
 I propose to add this functionality to Flink, in the form of a drop-in 
 addition, to work around the LGPL incompatibility with ALS 2.0. Users could 
 simply download a special flink-code-analysis.jar and drop it into the 
 lib folder to enable this functionality. We may even add a script to 
 tools that downloads that library automatically into the lib folder. This 
 should be legally fine, since we do not redistribute LGPL code and only 
 dynamically link it (the incompatibility with ASL 2.0 is mainly in the 
 patentability, if I remember correctly).
 Prior work on this has been done by [~aljoscha] and [~skunert], which could 
 provide a code base to start with.
 *Appendix*
 Hompage to Soot static analysis toolkit: http://www.sable.mcgill.ca/soot/
 Papers on static analysis and for optimization: 
 http://stratosphere.eu/assets/papers/EnablingOperatorReorderingSCA_12.pdf and 
 http://stratosphere.eu/assets/papers/openingTheBlackBoxes_12.pdf
 Quick introduction to the Optimizer: 
 http://stratosphere.eu/assets/papers/2014-VLDBJ_Stratosphere_Overview.pdf 
 (Section 6)
 Optimizer for Iterations: 
 http://stratosphere.eu/assets/papers/spinningFastIterativeDataFlows_12.pdf 
 (Sections 4.3 and 5.3)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-1319][core] Add static code analysis fo...

2015-06-03 Thread twalthr

Github user twalthr commented on a diff in the pull request:

https://github.com/apache/flink/pull/729#discussion_r31597259
  
--- Diff: 
flink-java/src/main/java/org/apache/flink/api/java/operators/SingleInputUdfOperator.java
 ---
@@ -54,8 +54,11 @@
 
private MapString, DataSet? broadcastVariables;
 
+   // NOTE: only set this variable via setSemanticProperties()
--- End diff --

Manual annotations should always trump optimizer annotations. The analyzer 
can not determine all semantic properties. E.g. when using KeySelectors. The 
user should still have the possibility to override semantic properties to add 
more properties.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [docs/javadoc][hotfix] Corrected Join hint and...

Github user vasia commented on the pull request:

https://github.com/apache/flink/pull/763#issuecomment-108262606
  
looks good, +1.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2098] Improvements on checkpoint-aligne...

Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/755#issuecomment-108218506
  
@StephanEwen I took the changes, added them on top of my PR and added some 
more refinements.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-2098) Checkpoint barrier initiation at source is not aligned with snapshotting


[ 
https://issues.apache.org/jira/browse/FLINK-2098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14570392#comment-14570392
 ] 

ASF GitHub Bot commented on FLINK-2098:
---

Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/755#issuecomment-108218506
  
@StephanEwen I took the changes, added them on top of my PR and added some 
more refinements.


 Checkpoint barrier initiation at source is not aligned with snapshotting
 

 Key: FLINK-2098
 URL: https://issues.apache.org/jira/browse/FLINK-2098
 Project: Flink
  Issue Type: Bug
  Components: Streaming
Affects Versions: 0.9
Reporter: Stephan Ewen
Assignee: Aljoscha Krettek
Priority: Blocker
 Fix For: 0.9


 The stream source does not properly align the emission of checkpoint barriers 
 with the drawing of snapshots.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-1993) Replace MultipleLinearRegression's custom SGD with optimization framework's SGD


[ 
https://issues.apache.org/jira/browse/FLINK-1993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14570492#comment-14570492
 ] 

ASF GitHub Bot commented on FLINK-1993:
---

Github user thvasilo commented on a diff in the pull request:

https://github.com/apache/flink/pull/760#discussion_r31604106
  
--- Diff: 
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/pipeline/Predictor.scala
 ---
@@ -35,7 +35,7 @@ import org.apache.flink.ml.common.{FlinkMLTools, 
ParameterMap, WithParameters}
   *
   * @tparam Self Type of the implementing class
   */
-trait Predictor[Self] extends Estimator[Self] with WithParameters with 
Serializable {
+trait Predictor[Self] extends Estimator[Self] with WithParameters {
--- End diff --

Why is Serializable no longer needed?


 Replace MultipleLinearRegression's custom SGD with optimization framework's 
 SGD
 ---

 Key: FLINK-1993
 URL: https://issues.apache.org/jira/browse/FLINK-1993
 Project: Flink
  Issue Type: Task
  Components: Machine Learning Library
Reporter: Till Rohrmann
Assignee: Theodore Vasiloudis
Priority: Minor
  Labels: ML
 Fix For: 0.9


 The current implementation of MultipleLinearRegression uses a custom SGD 
 implementation. Flink's optimization framework also contains a SGD optimizer 
 which should replace the custom implementation once the framework is merged.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-1993] [ml] Replaces custom SGD in Multi...

Github user thvasilo commented on a diff in the pull request:

https://github.com/apache/flink/pull/760#discussion_r31604106
  
--- Diff: 
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/pipeline/Predictor.scala
 ---
@@ -35,7 +35,7 @@ import org.apache.flink.ml.common.{FlinkMLTools, 
ParameterMap, WithParameters}
   *
   * @tparam Self Type of the implementing class
   */
-trait Predictor[Self] extends Estimator[Self] with WithParameters with 
Serializable {
+trait Predictor[Self] extends Estimator[Self] with WithParameters {
--- End diff --

Why is Serializable no longer needed?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Closed] (FLINK-2137) Expose partitionByHash for WindowedDataStream

2015-06-03 Thread Gyula Fora (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-2137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gyula Fora closed FLINK-2137.
-
Resolution: Not A Problem

 Expose partitionByHash for WindowedDataStream
 -

 Key: FLINK-2137
 URL: https://issues.apache.org/jira/browse/FLINK-2137
 Project: Flink
  Issue Type: New Feature
  Components: Streaming
Affects Versions: 0.9
Reporter: Márton Balassi
Assignee: Gábor Hermann

 This functionality has been recently exposed for DataStreams and 
 ConnectedDataStreams, but not for WindowedDataStreams yet.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-1993) Replace MultipleLinearRegression's custom SGD with optimization framework's SGD


[ 
https://issues.apache.org/jira/browse/FLINK-1993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14570503#comment-14570503
 ] 

ASF GitHub Bot commented on FLINK-1993:
---

Github user thvasilo commented on a diff in the pull request:

https://github.com/apache/flink/pull/760#discussion_r31604529
  
--- Diff: 
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/regression/MultipleLinearRegression.scala
 ---
@@ -87,11 +89,11 @@ import org.apache.flink.ml.pipeline.{FitOperation, 
PredictOperation, Predictor}
   *
   */
 class MultipleLinearRegression extends Predictor[MultipleLinearRegression] 
{
-
+  import org.apache.flink.ml._
--- End diff --

Line 49 typo: iteratinos - iterations


 Replace MultipleLinearRegression's custom SGD with optimization framework's 
 SGD
 ---

 Key: FLINK-1993
 URL: https://issues.apache.org/jira/browse/FLINK-1993
 Project: Flink
  Issue Type: Task
  Components: Machine Learning Library
Reporter: Till Rohrmann
Assignee: Theodore Vasiloudis
Priority: Minor
  Labels: ML
 Fix For: 0.9


 The current implementation of MultipleLinearRegression uses a custom SGD 
 implementation. Flink's optimization framework also contains a SGD optimizer 
 which should replace the custom implementation once the framework is merged.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (FLINK-2143) Add an overload to reduceWindow which takes the inverse of the reduceFunction as a second parameter

Gabor Gevay created FLINK-2143:
--

 Summary: Add an overload to reduceWindow which takes the inverse 
of the reduceFunction as a second parameter
 Key: FLINK-2143
 URL: https://issues.apache.org/jira/browse/FLINK-2143
 Project: Flink
  Issue Type: Sub-task
Reporter: Gabor Gevay
Assignee: Gabor Gevay


If the inverse of the reduceFunction is also available (for example subtraction 
when summing numbers), then a PreReducer can maintain the aggregate in O(1) 
memory and O(1) time for evict, store, and emitWindow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-2030) Implement an online histogram with Merging and equalization features

2015-06-03 Thread Theodore Vasiloudis (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14570462#comment-14570462
 ] 

Theodore Vasiloudis commented on FLINK-2030:


Is there a PR for this issue?

 Implement an online histogram with Merging and equalization features
 

 Key: FLINK-2030
 URL: https://issues.apache.org/jira/browse/FLINK-2030
 Project: Flink
  Issue Type: Sub-task
  Components: Machine Learning Library
Reporter: Sachin Goel
Assignee: Sachin Goel
Priority: Minor
  Labels: ML

 For the implementation of the decision tree in 
 https://issues.apache.org/jira/browse/FLINK-1727, we need to implement an 
 histogram with online updates, merging and equalization features. A reference 
 implementation is provided in [1]
 [1].http://www.jmlr.org/papers/volume11/ben-haim10a/ben-haim10a.pdf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [contrib] Storm compatibility

Github user mbalassi commented on the pull request:

https://github.com/apache/flink/pull/764#issuecomment-108260360
  
What exactly is broken what the commits?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Created] (FLINK-2141) Allow GSA's Gather to perform this operation in more than one direction

2015-06-03 Thread Andra Lungu (JIRA)

Andra Lungu created FLINK-2141:
--

 Summary: Allow GSA's Gather to perform this operation in more than 
one direction
 Key: FLINK-2141
 URL: https://issues.apache.org/jira/browse/FLINK-2141
 Project: Flink
  Issue Type: New Feature
  Components: Gelly
Affects Versions: 0.9
Reporter: Andra Lungu


For the time being, a vertex only gathers information from its in-edges.

Similarly to the vertex-centric approach, we would like to allow users to 
gather data from out and all edges as well. 

This property should be set using a setDirection() method.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-1993] [ml] Replaces custom SGD in Multi...

Github user thvasilo commented on the pull request:

https://github.com/apache/flink/pull/760#issuecomment-108254611
  
Looks good, some minor comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-1993) Replace MultipleLinearRegression's custom SGD with optimization framework's SGD


[ 
https://issues.apache.org/jira/browse/FLINK-1993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14570511#comment-14570511
 ] 

ASF GitHub Bot commented on FLINK-1993:
---

Github user thvasilo commented on the pull request:

https://github.com/apache/flink/pull/760#issuecomment-108254611
  
Looks good, some minor comments.


 Replace MultipleLinearRegression's custom SGD with optimization framework's 
 SGD
 ---

 Key: FLINK-1993
 URL: https://issues.apache.org/jira/browse/FLINK-1993
 Project: Flink
  Issue Type: Task
  Components: Machine Learning Library
Reporter: Till Rohrmann
Assignee: Theodore Vasiloudis
Priority: Minor
  Labels: ML
 Fix For: 0.9


 The current implementation of MultipleLinearRegression uses a custom SGD 
 implementation. Flink's optimization framework also contains a SGD optimizer 
 which should replace the custom implementation once the framework is merged.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (FLINK-2142) GSoC project: Exact and Approximate Statistics for Data Streams and Windows

Gabor Gevay created FLINK-2142:
--

 Summary: GSoC project: Exact and Approximate Statistics for Data 
Streams and Windows
 Key: FLINK-2142
 URL: https://issues.apache.org/jira/browse/FLINK-2142
 Project: Flink
  Issue Type: New Feature
  Components: Streaming
Reporter: Gabor Gevay
Assignee: Gabor Gevay
Priority: Minor


The goal of this project is to implement basic statistics of data streams and 
windows (like average, median, variance, correlation, etc.) in a 
computationally efficient manner. This involves designing custom preaggregators.

The exact calculation of some statistics (eg. frequencies, or the number of 
distinct elements) would require memory proportional to the number of elements 
in the input (the window or the entire stream). However, there are efficient 
algorithms and data structures using less memory for calculating the same 
statistics only approximately, with user-specified error bounds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (FLINK-2144) Implement count, average, and variance for windows

Gabor Gevay created FLINK-2144:
--

 Summary: Implement count, average, and variance for windows
 Key: FLINK-2144
 URL: https://issues.apache.org/jira/browse/FLINK-2144
 Project: Flink
  Issue Type: Sub-task
Reporter: Gabor Gevay
Assignee: Gabor Gevay
Priority: Minor


By count I mean the number of elements in the window.

These can be implemented very efficiently building on FLINK-2143.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-1967) Introduce (Event)time in Streaming

2015-06-03 Thread Aljoscha Krettek (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-1967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14570389#comment-14570389
 ] 

Aljoscha Krettek commented on FLINK-1967:
-

Are we still interested in having this in the next version? If yes, the 
windowing system will have to be reworked, meaning the whole shebang: policies, 
operators, window optimisations ...

The problem is, that the current model only works when elements arrive in order 
while working on user timestamps would require out-of-order processing.

 Introduce (Event)time in Streaming
 --

 Key: FLINK-1967
 URL: https://issues.apache.org/jira/browse/FLINK-1967
 Project: Flink
  Issue Type: Improvement
Reporter: Aljoscha Krettek
Assignee: Aljoscha Krettek

 This requires introducing a timestamp in streaming record and a change in the 
 sources to add timestamps to records. This will also introduce punctuations 
 (or low watermarks) to allow windows to work correctly on unordered, 
 timestamped input data. In the process of this, the windowing subsystem also 
 needs to be adapted to use the punctuations. Furthermore, all operators need 
 to be made aware of punctuations and correctly forward them. Then, a new 
 operator must be introduced to to allow modification of timestamps.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (FLINK-2140) Access the number of vertices from within the GSA functions

2015-06-03 Thread Andra Lungu (JIRA)

Andra Lungu created FLINK-2140:
--

 Summary: Access the number of vertices from within the GSA 
functions
 Key: FLINK-2140
 URL: https://issues.apache.org/jira/browse/FLINK-2140
 Project: Flink
  Issue Type: New Feature
  Components: Gelly
Affects Versions: 0.9
Reporter: Andra Lungu


Similarly to the Vertex-centric approach we would like to allow the user to 
access the number of vertices from the Gather, Sum and Apply functions 
respectively. This property will become available by setting 
[setOptNumVertices()] the numVertices option to true. 

The number of vertices can then be accessed in the gather, sum and apply 
functions using the getNumberOfVertices() method. If the option is not set in 
the configuration, this method will return -1. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (FLINK-1993) Replace MultipleLinearRegression's custom SGD with optimization framework's SGD

2015-06-03 Thread Theodore Vasiloudis (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-1993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Theodore Vasiloudis reassigned FLINK-1993:
--

Assignee: Till Rohrmann  (was: Theodore Vasiloudis)

 Replace MultipleLinearRegression's custom SGD with optimization framework's 
 SGD
 ---

 Key: FLINK-1993
 URL: https://issues.apache.org/jira/browse/FLINK-1993
 Project: Flink
  Issue Type: Task
  Components: Machine Learning Library
Reporter: Till Rohrmann
Assignee: Till Rohrmann
Priority: Minor
  Labels: ML
 Fix For: 0.9


 The current implementation of MultipleLinearRegression uses a custom SGD 
 implementation. Flink's optimization framework also contains a SGD optimizer 
 which should replace the custom implementation once the framework is merged.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (FLINK-2142) GSoC project: Exact and Approximate Statistics for Data Streams and Windows

[
https://issues.apache.org/jira/browse/FLINK-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Gabor Gevay updated FLINK-2142:
---
Description:
The goal of this project is to implement basic statistics of data streams and
windows (like average, median, variance, correlation, etc.) in a
computationally efficient manner. This involves designing custom PreReducers.

The exact calculation of some statistics (eg. frequencies, or the number of
distinct elements) would require memory proportional to the number of elements
in the input (the window or the entire stream). However, there are efficient
algorithms and data structures using less memory for calculating the same
statistics only approximately, with user-specified error bounds.

was:
The goal of this project is to implement basic statistics of data streams and
windows (like average, median, variance, correlation, etc.) in a
computationally efficient manner. This involves designing custom preaggregators.

GSoC project: Exact and Approximate Statistics for Data Streams and Windows
---

Key: FLINK-2142
URL: https://issues.apache.org/jira/browse/FLINK-2142
Project: Flink
Issue Type: New Feature
Components: Streaming
Reporter: Gabor Gevay
Assignee: Gabor Gevay
Priority: Minor
Labels: gsoc2015, statistics, streaming

The goal of this project is to implement basic statistics of data streams and
windows (like average, median, variance, correlation, etc.) in a
computationally efficient manner. This involves designing custom PreReducers.
The exact calculation of some statistics (eg. frequencies, or the number of
distinct elements) would require memory proportional to the number of
elements in the input (the window or the entire stream). However, there are
efficient algorithms and data structures using less memory for calculating
the same statistics only approximately, with user-specified error bounds.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-1526][gelly] [work in progress] Added M...

Github user vasia commented on the pull request:

https://github.com/apache/flink/pull/434#issuecomment-108278253
  
Hey @andralungu!

I think we should close this one. We can't really continue from this state 
anyway. I guess we'll have to revisit this problem once we have for-loop 
iteration support.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-1528][Gelly] Added Local Clustering Coe...