[GitHub] flink pull request: Fix handling of receiver and sender task failu...

2015-06-01 Thread uce
Github user uce commented on the pull request:

https://github.com/apache/flink/pull/746#issuecomment-107337457
  
Merging now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (FLINK-2125) String delimiter for SocketTextStream

2015-06-01 Thread JIRA
Márton Balassi created FLINK-2125:
-

 Summary: String delimiter for SocketTextStream
 Key: FLINK-2125
 URL: https://issues.apache.org/jira/browse/FLINK-2125
 Project: Flink
  Issue Type: Improvement
  Components: Streaming
Affects Versions: 0.9
Reporter: Márton Balassi
Priority: Minor


The SocketTextStreamFunction uses a character delimiter, despite other parts of 
the API using String delimiter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-2126) Scala shell tests sporadically fail on travis

2015-06-01 Thread Robert Metzger (JIRA)
Robert Metzger created FLINK-2126:
-

 Summary: Scala shell tests sporadically fail on travis
 Key: FLINK-2126
 URL: https://issues.apache.org/jira/browse/FLINK-2126
 Project: Flink
  Issue Type: Bug
  Components: Scala Shell
Affects Versions: 0.9
Reporter: Robert Metzger


See https://travis-ci.org/rmetzger/flink/jobs/64893149



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2120) Rename AbstractJobVertex to JobVertex

2015-06-01 Thread Maximilian Michels (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14567209#comment-14567209
 ] 

Maximilian Michels commented on FLINK-2120:
---

+1

 Rename AbstractJobVertex to JobVertex
 -

 Key: FLINK-2120
 URL: https://issues.apache.org/jira/browse/FLINK-2120
 Project: Flink
  Issue Type: Wish
  Components: Distributed Runtime
Affects Versions: master
Reporter: Ufuk Celebi
Priority: Trivial

 I would like to rename AbstractJobVertex to JobVertex. It is not abstract and 
 we have a lot of references to it in tests, where we create instances. This 
 is trivial, but I think it is a bad name.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2037] Provide flink-python.jar in lib/

2015-06-01 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/691


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2037) Unable to start Python API using ./bin/pyflink*.sh

2015-06-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14567216#comment-14567216
 ] 

ASF GitHub Bot commented on FLINK-2037:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/691


 Unable to start Python API using ./bin/pyflink*.sh
 --

 Key: FLINK-2037
 URL: https://issues.apache.org/jira/browse/FLINK-2037
 Project: Flink
  Issue Type: Bug
  Components: Python API
Affects Versions: 0.9
Reporter: Robert Metzger
Assignee: Robert Metzger
Priority: Critical

 Calling {{./bin/pyflink3.sh}} will lead to
 {code}
 ./bin/pyflink3.sh
 log4j:WARN No appenders could be found for logger 
 (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
 log4j:WARN Please initialize the log4j system properly.
 log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more 
 info.
 JAR file does not exist: 
 /home/robert/incubator-flink/build-target/bin/../lib/flink-python-0.9-SNAPSHOT.jar
 Use the help option (-h or --help) to get help on the command.
 {code}
 This is due to the script expecting a {{flink-python-0.9-SNAPSHOT.jar}} file 
 to exist in {{lib}} (its wrong anyways that the version name is included 
 here. That should be replaced by a {{*}}).
 I'll look into the issue ...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1319) Add static code analysis for UDFs

2015-06-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14567225#comment-14567225
 ] 

ASF GitHub Bot commented on FLINK-1319:
---

Github user uce commented on the pull request:

https://github.com/apache/flink/pull/729#issuecomment-107419997
  
OK, I will review this.

I vote to stick to Stephan's suggested approach instead of package based 
exclusions: analyze everything and allow exclusions with a `@SkipCodeAnalysis` 
annotation.

Any further opinions on the output of the analysis (stdout vs. logging 
question)?


 Add static code analysis for UDFs
 -

 Key: FLINK-1319
 URL: https://issues.apache.org/jira/browse/FLINK-1319
 Project: Flink
  Issue Type: New Feature
  Components: Java API, Scala API
Reporter: Stephan Ewen
Assignee: Timo Walther
Priority: Minor

 Flink's Optimizer takes information that tells it for UDFs which fields of 
 the input elements are accessed, modified, or frwarded/copied. This 
 information frequently helps to reuse partitionings, sorts, etc. It may speed 
 up programs significantly, as it can frequently eliminate sorts and shuffles, 
 which are costly.
 Right now, users can add lightweight annotations to UDFs to provide this 
 information (such as adding {{@ConstandFields(0-3, 1, 2-1)}}.
 We worked with static code analysis of UDFs before, to determine this 
 information automatically. This is an incredible feature, as it magically 
 makes programs faster.
 For record-at-a-time operations (Map, Reduce, FlatMap, Join, Cross), this 
 works surprisingly well in many cases. We used the Soot toolkit for the 
 static code analysis. Unfortunately, Soot is LGPL licensed and thus we did 
 not include any of the code so far.
 I propose to add this functionality to Flink, in the form of a drop-in 
 addition, to work around the LGPL incompatibility with ALS 2.0. Users could 
 simply download a special flink-code-analysis.jar and drop it into the 
 lib folder to enable this functionality. We may even add a script to 
 tools that downloads that library automatically into the lib folder. This 
 should be legally fine, since we do not redistribute LGPL code and only 
 dynamically link it (the incompatibility with ASL 2.0 is mainly in the 
 patentability, if I remember correctly).
 Prior work on this has been done by [~aljoscha] and [~skunert], which could 
 provide a code base to start with.
 *Appendix*
 Hompage to Soot static analysis toolkit: http://www.sable.mcgill.ca/soot/
 Papers on static analysis and for optimization: 
 http://stratosphere.eu/assets/papers/EnablingOperatorReorderingSCA_12.pdf and 
 http://stratosphere.eu/assets/papers/openingTheBlackBoxes_12.pdf
 Quick introduction to the Optimizer: 
 http://stratosphere.eu/assets/papers/2014-VLDBJ_Stratosphere_Overview.pdf 
 (Section 6)
 Optimizer for Iterations: 
 http://stratosphere.eu/assets/papers/spinningFastIterativeDataFlows_12.pdf 
 (Sections 4.3 and 5.3)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1319][core] Add static code analysis fo...

2015-06-01 Thread rmetzger
Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/729#issuecomment-107422340
  
+1 for the annotation.

I'm against using stdout. Logging frameworks are much better at controlling 
the output.
The quickstart mvn archetype provides a log4j.properties file, so we can 
configure it the way we want it to.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1319) Add static code analysis for UDFs

2015-06-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14567244#comment-14567244
 ] 

ASF GitHub Bot commented on FLINK-1319:
---

Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/729#issuecomment-107422340
  
+1 for the annotation.

I'm against using stdout. Logging frameworks are much better at controlling 
the output.
The quickstart mvn archetype provides a log4j.properties file, so we can 
configure it the way we want it to.


 Add static code analysis for UDFs
 -

 Key: FLINK-1319
 URL: https://issues.apache.org/jira/browse/FLINK-1319
 Project: Flink
  Issue Type: New Feature
  Components: Java API, Scala API
Reporter: Stephan Ewen
Assignee: Timo Walther
Priority: Minor

 Flink's Optimizer takes information that tells it for UDFs which fields of 
 the input elements are accessed, modified, or frwarded/copied. This 
 information frequently helps to reuse partitionings, sorts, etc. It may speed 
 up programs significantly, as it can frequently eliminate sorts and shuffles, 
 which are costly.
 Right now, users can add lightweight annotations to UDFs to provide this 
 information (such as adding {{@ConstandFields(0-3, 1, 2-1)}}.
 We worked with static code analysis of UDFs before, to determine this 
 information automatically. This is an incredible feature, as it magically 
 makes programs faster.
 For record-at-a-time operations (Map, Reduce, FlatMap, Join, Cross), this 
 works surprisingly well in many cases. We used the Soot toolkit for the 
 static code analysis. Unfortunately, Soot is LGPL licensed and thus we did 
 not include any of the code so far.
 I propose to add this functionality to Flink, in the form of a drop-in 
 addition, to work around the LGPL incompatibility with ALS 2.0. Users could 
 simply download a special flink-code-analysis.jar and drop it into the 
 lib folder to enable this functionality. We may even add a script to 
 tools that downloads that library automatically into the lib folder. This 
 should be legally fine, since we do not redistribute LGPL code and only 
 dynamically link it (the incompatibility with ASL 2.0 is mainly in the 
 patentability, if I remember correctly).
 Prior work on this has been done by [~aljoscha] and [~skunert], which could 
 provide a code base to start with.
 *Appendix*
 Hompage to Soot static analysis toolkit: http://www.sable.mcgill.ca/soot/
 Papers on static analysis and for optimization: 
 http://stratosphere.eu/assets/papers/EnablingOperatorReorderingSCA_12.pdf and 
 http://stratosphere.eu/assets/papers/openingTheBlackBoxes_12.pdf
 Quick introduction to the Optimizer: 
 http://stratosphere.eu/assets/papers/2014-VLDBJ_Stratosphere_Overview.pdf 
 (Section 6)
 Optimizer for Iterations: 
 http://stratosphere.eu/assets/papers/spinningFastIterativeDataFlows_12.pdf 
 (Sections 4.3 and 5.3)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: Implemented TwitterSourceFilter and adapted Tw...

2015-06-01 Thread aljoscha
Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/695#issuecomment-107422496
  
Is this good to merge now? I'm rewriting all the sources (again) and this 
should probably go in.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Assigned] (FLINK-1430) Add test for streaming scala api completeness

2015-06-01 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/FLINK-1430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Márton Balassi reassigned FLINK-1430:
-

Assignee: Márton Balassi  (was: Mingliang Qi)

 Add test for streaming scala api completeness
 -

 Key: FLINK-1430
 URL: https://issues.apache.org/jira/browse/FLINK-1430
 Project: Flink
  Issue Type: Test
  Components: Streaming
Affects Versions: 0.9
Reporter: Márton Balassi
Assignee: Márton Balassi

 Currently the completeness of the streaming scala api is not tested.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: New operator state interfaces

2015-06-01 Thread rmetzger
Github user rmetzger commented on a diff in the pull request:

https://github.com/apache/flink/pull/747#discussion_r31420656
  
--- Diff: 
flink-staging/flink-streaming/flink-streaming-connectors/flink-connector-kafka/src/main/java/org/apache/flink/streaming/connectors/kafka/api/persistent/PersistentKafkaSource.java
 ---
@@ -140,11 +144,14 @@ public void open(Configuration parameters) throws 
Exception {
// most likely the number of offsets we're going to store here 
will be lower than the number of partitions.
int numPartitions = getNumberOfPartitions();
LOG.debug(The topic {} has {} partitions, topicName, 
numPartitions);
-   this.lastOffsets = new long[numPartitions];
-   this.commitedOffsets = new long[numPartitions];
-   Arrays.fill(this.lastOffsets, -1);
-   Arrays.fill(this.commitedOffsets, 0); // just to make it clear
-
+   
+   long[] defaultOffset = new long[numPartitions];
+   Arrays.fill(defaultOffset, -1);
+   
+   this.lastOffsets = 
getRuntimeContext().getOperatorState(offset, defaultOffset);
+   
+   //TODO: commit fetched offset to ZK if not default
--- End diff --

We can not merge the PR with this TODO open.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1319) Add static code analysis for UDFs

2015-06-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14567146#comment-14567146
 ] 

ASF GitHub Bot commented on FLINK-1319:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/729#issuecomment-107393190
  
Let us merge this for 0.9 and have it deactivated by default.

Let's gradually activate it in the next releases as it gets exposure


 Add static code analysis for UDFs
 -

 Key: FLINK-1319
 URL: https://issues.apache.org/jira/browse/FLINK-1319
 Project: Flink
  Issue Type: New Feature
  Components: Java API, Scala API
Reporter: Stephan Ewen
Assignee: Timo Walther
Priority: Minor

 Flink's Optimizer takes information that tells it for UDFs which fields of 
 the input elements are accessed, modified, or frwarded/copied. This 
 information frequently helps to reuse partitionings, sorts, etc. It may speed 
 up programs significantly, as it can frequently eliminate sorts and shuffles, 
 which are costly.
 Right now, users can add lightweight annotations to UDFs to provide this 
 information (such as adding {{@ConstandFields(0-3, 1, 2-1)}}.
 We worked with static code analysis of UDFs before, to determine this 
 information automatically. This is an incredible feature, as it magically 
 makes programs faster.
 For record-at-a-time operations (Map, Reduce, FlatMap, Join, Cross), this 
 works surprisingly well in many cases. We used the Soot toolkit for the 
 static code analysis. Unfortunately, Soot is LGPL licensed and thus we did 
 not include any of the code so far.
 I propose to add this functionality to Flink, in the form of a drop-in 
 addition, to work around the LGPL incompatibility with ALS 2.0. Users could 
 simply download a special flink-code-analysis.jar and drop it into the 
 lib folder to enable this functionality. We may even add a script to 
 tools that downloads that library automatically into the lib folder. This 
 should be legally fine, since we do not redistribute LGPL code and only 
 dynamically link it (the incompatibility with ASL 2.0 is mainly in the 
 patentability, if I remember correctly).
 Prior work on this has been done by [~aljoscha] and [~skunert], which could 
 provide a code base to start with.
 *Appendix*
 Hompage to Soot static analysis toolkit: http://www.sable.mcgill.ca/soot/
 Papers on static analysis and for optimization: 
 http://stratosphere.eu/assets/papers/EnablingOperatorReorderingSCA_12.pdf and 
 http://stratosphere.eu/assets/papers/openingTheBlackBoxes_12.pdf
 Quick introduction to the Optimizer: 
 http://stratosphere.eu/assets/papers/2014-VLDBJ_Stratosphere_Overview.pdf 
 (Section 6)
 Optimizer for Iterations: 
 http://stratosphere.eu/assets/papers/spinningFastIterativeDataFlows_12.pdf 
 (Sections 4.3 and 5.3)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-2037) Unable to start Python API using ./bin/pyflink*.sh

2015-06-01 Thread Robert Metzger (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger resolved FLINK-2037.
---
Resolution: Fixed

Resolved in http://git-wip-us.apache.org/repos/asf/flink/commit/2f347a03

 Unable to start Python API using ./bin/pyflink*.sh
 --

 Key: FLINK-2037
 URL: https://issues.apache.org/jira/browse/FLINK-2037
 Project: Flink
  Issue Type: Bug
  Components: Python API
Affects Versions: 0.9
Reporter: Robert Metzger
Assignee: Robert Metzger
Priority: Critical

 Calling {{./bin/pyflink3.sh}} will lead to
 {code}
 ./bin/pyflink3.sh
 log4j:WARN No appenders could be found for logger 
 (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
 log4j:WARN Please initialize the log4j system properly.
 log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more 
 info.
 JAR file does not exist: 
 /home/robert/incubator-flink/build-target/bin/../lib/flink-python-0.9-SNAPSHOT.jar
 Use the help option (-h or --help) to get help on the command.
 {code}
 This is due to the script expecting a {{flink-python-0.9-SNAPSHOT.jar}} file 
 to exist in {{lib}} (its wrong anyways that the version name is included 
 here. That should be replaced by a {{*}}).
 I'll look into the issue ...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-1954) Task Failures and Error Handling

2015-06-01 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi resolved FLINK-1954.

   Resolution: Fixed
Fix Version/s: 0.9

Fixed in 1aad5b759432f0b59a9dcc366a4b66c2681626f1, 
2a65b62216e8fb73fce65209bf646ca67e5f96b0, 
dce1be18593539ff29c3d55c5f2c1208a2e54c10, 
f75c16b0540c839079188bb58b5acf2ede108767.

 Task Failures and Error Handling
 

 Key: FLINK-1954
 URL: https://issues.apache.org/jira/browse/FLINK-1954
 Project: Flink
  Issue Type: Improvement
  Components: Distributed Runtime
Affects Versions: master
Reporter: Ufuk Celebi
Priority: Blocker
 Fix For: 0.9


 This is an issue to keep track of subtasks for error handling of task 
 failures.
 The design doc for this can be found here: 
 https://cwiki.apache.org/confluence/display/FLINK/Task+Failures+and+Error+Handling



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-1955) Improve error handling of sender task failures

2015-06-01 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi updated FLINK-1955:
---
Fix Version/s: 0.9

 Improve error handling of sender task failures
 --

 Key: FLINK-1955
 URL: https://issues.apache.org/jira/browse/FLINK-1955
 Project: Flink
  Issue Type: Improvement
  Components: Distributed Runtime
Affects Versions: master
Reporter: Ufuk Celebi
 Fix For: 0.9


 Currently, if a sender task fails, the produced partition is silently 
 released.
 We want:
 * Produced result partition becomes erroneous with a SenderFailedException
 * Receiver cancels itself when encountering the SenderFailedException
 * May also be cancelled by the JobManager (if that call is faster than the 
 detection of the failed sender)
 * This closes the Netty channel



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-1941) Add documentation for Gelly-GSA

2015-06-01 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi updated FLINK-1941:
---
Fix Version/s: 0.9

 Add documentation for Gelly-GSA
 ---

 Key: FLINK-1941
 URL: https://issues.apache.org/jira/browse/FLINK-1941
 Project: Flink
  Issue Type: Task
  Components: Gelly
Affects Versions: 0.9
Reporter: Vasia Kalavri
Assignee: Vasia Kalavri
  Labels: docs, gelly
 Fix For: 0.9


 Add a section in the Gelly guide to describe the newly introduced 
 Gather-Sum-Apply iteration method. Show how GSA uses delta iterations 
 internally and explain the differences of this model as compared to 
 vertex-centric.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-2123) Fix CLI client logging

2015-06-01 Thread Ufuk Celebi (JIRA)
Ufuk Celebi created FLINK-2123:
--

 Summary: Fix CLI client logging
 Key: FLINK-2123
 URL: https://issues.apache.org/jira/browse/FLINK-2123
 Project: Flink
  Issue Type: Bug
  Components: Core
Affects Versions: master
Reporter: Ufuk Celebi


The CLI client complains about missing log4j configuration and prints too much 
information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2076) Bug in re-openable hash join

2015-06-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14567143#comment-14567143
 ] 

ASF GitHub Bot commented on FLINK-2076:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/751#issuecomment-107392568
  
Awesome, thanks a lot @chiwanpark 


 Bug in re-openable hash join
 

 Key: FLINK-2076
 URL: https://issues.apache.org/jira/browse/FLINK-2076
 Project: Flink
  Issue Type: Bug
  Components: Local Runtime
Affects Versions: 0.9
Reporter: Stephan Ewen
Assignee: Chiwan Park

 It happens deterministically in my machine with the following setup:
 TaskManager:
   - heap size: 512m
   - network buffers: 4096
   - slots: 32
 Job:
   - ConnectedComponents
   - 100k vertices
   - 1.2m edges
 -- this gives around 260 m Flink managed memory, across 32 slots is 8MB per 
 slot, with several mem consumers in the job, makes the iterative hash join 
 out-of-core
 {code}
 java.lang.RuntimeException: Hash Join bug in memory management: 
 Memory buffers leaked.
   at 
 org.apache.flink.runtime.operators.hash.MutableHashTable.buildTableFromSpilledPartition(MutableHashTable.java:733)
   at 
 org.apache.flink.runtime.operators.hash.MutableHashTable.prepareNextPartition(MutableHashTable.java:508)
   at 
 org.apache.flink.runtime.operators.hash.ReOpenableMutableHashTable.prepareNextPartition(ReOpenableMutableHashTable.java:167)
   at 
 org.apache.flink.runtime.operators.hash.MutableHashTable.nextRecord(MutableHashTable.java:541)
   at 
 org.apache.flink.runtime.operators.hash.NonReusingBuildSecondHashMatchIterator.callWithNextKey(NonReusingBuildSecondHashMatchIterator.java:102)
   at 
 org.apache.flink.runtime.operators.AbstractCachedBuildSideMatchDriver.run(AbstractCachedBuildSideMatchDriver.java:155)
   at 
 org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:496)
   at 
 org.apache.flink.runtime.iterative.task.AbstractIterativePactTask.run(AbstractIterativePactTask.java:139)
   at 
 org.apache.flink.runtime.iterative.task.IterationIntermediatePactTask.run(IterationIntermediatePactTask.java:92)
   at 
 org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362)
   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:560)
   at java.lang.Thread.run(Thread.java:745)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2004) Memory leak in presence of failed checkpoints in KafkaSource

2015-06-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14567144#comment-14567144
 ] 

ASF GitHub Bot commented on FLINK-2004:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/674#issuecomment-107392647
  
Looks good!

Will merge this..


 Memory leak in presence of failed checkpoints in KafkaSource
 

 Key: FLINK-2004
 URL: https://issues.apache.org/jira/browse/FLINK-2004
 Project: Flink
  Issue Type: Bug
  Components: Streaming
Affects Versions: 0.9
Reporter: Stephan Ewen
Assignee: Robert Metzger
Priority: Critical
 Fix For: 0.9


 Checkpoints that fail never send a commit message to the tasks.
 Maintaining a map of all pending checkpoints introduces a memory leak, as 
 entries for failed checkpoints will never be removed.
 Approaches to fix this:
   - The source cleans up entries from older checkpoints once a checkpoint is 
 committed (simple implementation in a linked hash map)
   - The commit message could include the optional state handle (source needs 
 not maintain the map)
   - The checkpoint coordinator could send messages for failed checkpoints?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2076] [runtime] Fix memory leakage in M...

2015-06-01 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/751#issuecomment-107392568
  
Awesome, thanks a lot @chiwanpark 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2004] Fix memory leak in presence of fa...

2015-06-01 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/674#issuecomment-107392647
  
Looks good!

Will merge this..


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2037] Provide flink-python.jar in lib/

2015-06-01 Thread rmetzger
Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/691#issuecomment-107391906
  
I'm going to merge this ...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (FLINK-2122) Make all internal streaming operators Checkpointable

2015-06-01 Thread Aljoscha Krettek (JIRA)
Aljoscha Krettek created FLINK-2122:
---

 Summary: Make all internal streaming operators Checkpointable
 Key: FLINK-2122
 URL: https://issues.apache.org/jira/browse/FLINK-2122
 Project: Flink
  Issue Type: Improvement
  Components: Streaming
Reporter: Aljoscha Krettek


Right now, only user state is checkpointed and restored. This should be 
extended to flink-internal operators such as reducers and windows.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2098) Checkpoint barrier initiation at source is not aligned with snapshotting

2015-06-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14567126#comment-14567126
 ] 

ASF GitHub Bot commented on FLINK-2098:
---

Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/742#issuecomment-107388477
  
As per the discussion on the mailing list I'm rewriting the Source 
interface to only have the run()/cancel() variant.


 Checkpoint barrier initiation at source is not aligned with snapshotting
 

 Key: FLINK-2098
 URL: https://issues.apache.org/jira/browse/FLINK-2098
 Project: Flink
  Issue Type: Bug
  Components: Streaming
Affects Versions: 0.9
Reporter: Stephan Ewen
Assignee: Aljoscha Krettek
Priority: Blocker
 Fix For: 0.9


 The stream source does not properly align the emission of checkpoint barriers 
 with the drawing of snapshots.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-2124) FromElementsFunction is not really Serializable

2015-06-01 Thread Aljoscha Krettek (JIRA)
Aljoscha Krettek created FLINK-2124:
---

 Summary: FromElementsFunction is not really Serializable
 Key: FLINK-2124
 URL: https://issues.apache.org/jira/browse/FLINK-2124
 Project: Flink
  Issue Type: Bug
  Components: Streaming
Reporter: Aljoscha Krettek


The function stores an Iterable of T. T is not necessarily Serializable and and 
Iterable is also not necessarily Serializable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2098] Ensure checkpoints and element em...

2015-06-01 Thread aljoscha
Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/742#issuecomment-107388477
  
As per the discussion on the mailing list I'm rewriting the Source 
interface to only have the run()/cancel() variant.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Resolved] (FLINK-1958) Improve error handling of receiver task failures

2015-06-01 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi resolved FLINK-1958.

Resolution: Fixed

Fixed in dce1be1.

 Improve error handling of receiver task failures
 

 Key: FLINK-1958
 URL: https://issues.apache.org/jira/browse/FLINK-1958
 Project: Flink
  Issue Type: Improvement
  Components: Distributed Runtime
Affects Versions: master
Reporter: Ufuk Celebi

 Currently, receiver task failures silently fail.
 We need the following behaviour:
 * Sender keeps going. May be back-pressured when no receiver pulls the data 
 any more.
 * Sender may be cancelled by JobManager
 * Partition stays sane
 * Netty channel needs to be closed
 * Transfer needs to be canceled by a cancel message (receiver to sender)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-1958) Improve error handling of receiver task failures

2015-06-01 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi updated FLINK-1958:
---
Fix Version/s: 0.9

 Improve error handling of receiver task failures
 

 Key: FLINK-1958
 URL: https://issues.apache.org/jira/browse/FLINK-1958
 Project: Flink
  Issue Type: Improvement
  Components: Distributed Runtime
Affects Versions: master
Reporter: Ufuk Celebi
 Fix For: 0.9


 Currently, receiver task failures silently fail.
 We need the following behaviour:
 * Sender keeps going. May be back-pressured when no receiver pulls the data 
 any more.
 * Sender may be cancelled by JobManager
 * Partition stays sane
 * Netty channel needs to be closed
 * Transfer needs to be canceled by a cancel message (receiver to sender)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-1955) Improve error handling of sender task failures

2015-06-01 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi resolved FLINK-1955.

Resolution: Fixed

Fixed in f75c16b.

 Improve error handling of sender task failures
 --

 Key: FLINK-1955
 URL: https://issues.apache.org/jira/browse/FLINK-1955
 Project: Flink
  Issue Type: Improvement
  Components: Distributed Runtime
Affects Versions: master
Reporter: Ufuk Celebi

 Currently, if a sender task fails, the produced partition is silently 
 released.
 We want:
 * Produced result partition becomes erroneous with a SenderFailedException
 * Receiver cancels itself when encountering the SenderFailedException
 * May also be cancelled by the JobManager (if that call is faster than the 
 detection of the failed sender)
 * This closes the Netty channel



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1319][core] Add static code analysis fo...

2015-06-01 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/729#issuecomment-107393190
  
Let us merge this for 0.9 and have it deactivated by default.

Let's gradually activate it in the next releases as it gets exposure


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2037) Unable to start Python API using ./bin/pyflink*.sh

2015-06-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14567139#comment-14567139
 ] 

ASF GitHub Bot commented on FLINK-2037:
---

Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/691#issuecomment-107391906
  
I'm going to merge this ...


 Unable to start Python API using ./bin/pyflink*.sh
 --

 Key: FLINK-2037
 URL: https://issues.apache.org/jira/browse/FLINK-2037
 Project: Flink
  Issue Type: Bug
  Components: Python API
Affects Versions: 0.9
Reporter: Robert Metzger
Assignee: Robert Metzger
Priority: Critical

 Calling {{./bin/pyflink3.sh}} will lead to
 {code}
 ./bin/pyflink3.sh
 log4j:WARN No appenders could be found for logger 
 (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
 log4j:WARN Please initialize the log4j system properly.
 log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more 
 info.
 JAR file does not exist: 
 /home/robert/incubator-flink/build-target/bin/../lib/flink-python-0.9-SNAPSHOT.jar
 Use the help option (-h or --help) to get help on the command.
 {code}
 This is due to the script expecting a {{flink-python-0.9-SNAPSHOT.jar}} file 
 to exist in {{lib}} (its wrong anyways that the version name is included 
 here. That should be replaced by a {{*}}).
 I'll look into the issue ...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1993) Replace MultipleLinearRegression's custom SGD with optimization framework's SGD

2015-06-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14567399#comment-14567399
 ] 

ASF GitHub Bot commented on FLINK-1993:
---

Github user thvasilo closed the pull request at:

https://github.com/apache/flink/pull/725


 Replace MultipleLinearRegression's custom SGD with optimization framework's 
 SGD
 ---

 Key: FLINK-1993
 URL: https://issues.apache.org/jira/browse/FLINK-1993
 Project: Flink
  Issue Type: Task
  Components: Machine Learning Library
Reporter: Till Rohrmann
Assignee: Theodore Vasiloudis
Priority: Minor
  Labels: ML
 Fix For: 0.9


 The current implementation of MultipleLinearRegression uses a custom SGD 
 implementation. Flink's optimization framework also contains a SGD optimizer 
 which should replace the custom implementation once the framework is merged.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2101) Scheme Inference doesn't work for Tuple5

2015-06-01 Thread Ufuk Celebi (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14567536#comment-14567536
 ] 

Ufuk Celebi commented on FLINK-2101:


I think there is no problem, but Robert wanted to improve the Exception 
message. Robert, can you do this and then close the issue?

 Scheme Inference doesn't work for Tuple5
 

 Key: FLINK-2101
 URL: https://issues.apache.org/jira/browse/FLINK-2101
 Project: Flink
  Issue Type: Bug
  Components: Kafka Connector
Affects Versions: master
Reporter: Rico Bergmann
Assignee: Robert Metzger

 Calling addSink(new KafkaSinkTuple5String, String, String, Long, Double(
   localhost:9092,
   webtrends.ec1601,
   new 
 Utils.TypeInformationSerializationSchemaTuple5String, String, String, Long, 
 Double(
   new 
 Tuple5String, String, String, Long, Double(),
   
 env.getConfig(;
 gives me an Exception stating, that the generic type infos are not given.
 Exception in thread main 
 org.apache.flink.api.common.functions.InvalidTypesException: Tuple needs to 
 be parameterized by using generics.
   at 
 org.apache.flink.api.java.typeutils.TypeExtractor.createTypeInfoWithTypeHierarchy(TypeExtractor.java:396)
   at 
 org.apache.flink.api.java.typeutils.TypeExtractor.privateCreateTypeInfo(TypeExtractor.java:339)
   at 
 org.apache.flink.api.java.typeutils.TypeExtractor.createTypeInfo(TypeExtractor.java:318)
   at 
 org.apache.flink.api.java.typeutils.ObjectArrayTypeInfo.init(ObjectArrayTypeInfo.java:45)
   at 
 org.apache.flink.api.java.typeutils.ObjectArrayTypeInfo.getInfoFor(ObjectArrayTypeInfo.java:167)
   at 
 org.apache.flink.api.java.typeutils.TypeExtractor.privateGetForClass(TypeExtractor.java:1150)
   at 
 org.apache.flink.api.java.typeutils.TypeExtractor.privateGetForClass(TypeExtractor.java:1122)
   at 
 org.apache.flink.api.java.typeutils.TypeExtractor.privateGetForObject(TypeExtractor.java:1476)
   at 
 org.apache.flink.api.java.typeutils.TypeExtractor.getForObject(TypeExtractor.java:1446)
   at 
 org.apache.flink.streaming.connectors.kafka.Utils$TypeInformationSerializationSchema.init(Utils.java:37)
   at de.otto.streamexample.WCExample.main(WCExample.java:132)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2098) Checkpoint barrier initiation at source is not aligned with snapshotting

2015-06-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14567445#comment-14567445
 ] 

ASF GitHub Bot commented on FLINK-2098:
---

Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/742#issuecomment-107589378
  
I reworked the sources now. Could someone please have another pass over 
this. I think this is very critical code.


 Checkpoint barrier initiation at source is not aligned with snapshotting
 

 Key: FLINK-2098
 URL: https://issues.apache.org/jira/browse/FLINK-2098
 Project: Flink
  Issue Type: Bug
  Components: Streaming
Affects Versions: 0.9
Reporter: Stephan Ewen
Assignee: Aljoscha Krettek
Priority: Blocker
 Fix For: 0.9


 The stream source does not properly align the emission of checkpoint barriers 
 with the drawing of snapshots.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2121] Fix the summation in FileInputFor...

2015-06-01 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/752


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2101) Scheme Inference doesn't work for Tuple5

2015-06-01 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14567489#comment-14567489
 ] 

Stephan Ewen commented on FLINK-2101:
-

So, is there any problem remaining, or can this issue be closed?

 Scheme Inference doesn't work for Tuple5
 

 Key: FLINK-2101
 URL: https://issues.apache.org/jira/browse/FLINK-2101
 Project: Flink
  Issue Type: Bug
  Components: Kafka Connector
Affects Versions: master
Reporter: Rico Bergmann
Assignee: Robert Metzger

 Calling addSink(new KafkaSinkTuple5String, String, String, Long, Double(
   localhost:9092,
   webtrends.ec1601,
   new 
 Utils.TypeInformationSerializationSchemaTuple5String, String, String, Long, 
 Double(
   new 
 Tuple5String, String, String, Long, Double(),
   
 env.getConfig(;
 gives me an Exception stating, that the generic type infos are not given.
 Exception in thread main 
 org.apache.flink.api.common.functions.InvalidTypesException: Tuple needs to 
 be parameterized by using generics.
   at 
 org.apache.flink.api.java.typeutils.TypeExtractor.createTypeInfoWithTypeHierarchy(TypeExtractor.java:396)
   at 
 org.apache.flink.api.java.typeutils.TypeExtractor.privateCreateTypeInfo(TypeExtractor.java:339)
   at 
 org.apache.flink.api.java.typeutils.TypeExtractor.createTypeInfo(TypeExtractor.java:318)
   at 
 org.apache.flink.api.java.typeutils.ObjectArrayTypeInfo.init(ObjectArrayTypeInfo.java:45)
   at 
 org.apache.flink.api.java.typeutils.ObjectArrayTypeInfo.getInfoFor(ObjectArrayTypeInfo.java:167)
   at 
 org.apache.flink.api.java.typeutils.TypeExtractor.privateGetForClass(TypeExtractor.java:1150)
   at 
 org.apache.flink.api.java.typeutils.TypeExtractor.privateGetForClass(TypeExtractor.java:1122)
   at 
 org.apache.flink.api.java.typeutils.TypeExtractor.privateGetForObject(TypeExtractor.java:1476)
   at 
 org.apache.flink.api.java.typeutils.TypeExtractor.getForObject(TypeExtractor.java:1446)
   at 
 org.apache.flink.streaming.connectors.kafka.Utils$TypeInformationSerializationSchema.init(Utils.java:37)
   at de.otto.streamexample.WCExample.main(WCExample.java:132)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2098] Ensure checkpoints and element em...

2015-06-01 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/742#issuecomment-107613541
  
I'll make a pass


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2127) The GSA Documentation has trailing /p s

2015-06-01 Thread Ufuk Celebi (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14567396#comment-14567396
 ] 

Ufuk Celebi commented on FLINK-2127:


Building it locally works fine. I guess it's a problem with the CI server, 
which builds the docs. I'm not familiar with the setup there, but it has 
probably another version of Jekyll/kramdown installed. Can you build it locally 
with ./docs/build_docs.sh -p and check 
http://localhost:4000/libs/gelly_guide.html#vertex-centric-iterations?

 The GSA Documentation has trailing /p s
 -

 Key: FLINK-2127
 URL: https://issues.apache.org/jira/browse/FLINK-2127
 Project: Flink
  Issue Type: Bug
  Components: Documentation, Gelly
Affects Versions: 0.9
Reporter: Andra Lungu
Priority: Minor

 Within the GSA Section of the documentation, there are trailing: p 
 class=text-center image /p. 
 It would be nice to remove them :) 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1430) Add test for streaming scala api completeness

2015-06-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14567410#comment-14567410
 ] 

ASF GitHub Bot commented on FLINK-1430:
---

Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/753#issuecomment-107569766
  
+1 looks good to merge :smile: 


 Add test for streaming scala api completeness
 -

 Key: FLINK-1430
 URL: https://issues.apache.org/jira/browse/FLINK-1430
 Project: Flink
  Issue Type: Test
  Components: Streaming
Affects Versions: 0.9
Reporter: Márton Balassi
Assignee: Márton Balassi

 Currently the completeness of the streaming scala api is not tested.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2098] Ensure checkpoints and element em...

2015-06-01 Thread aljoscha
Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/742#issuecomment-107589378
  
I reworked the sources now. Could someone please have another pass over 
this. I think this is very critical code.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Closed] (FLINK-2121) FileInputFormat.addFilesInDir miscalculates total size

2015-06-01 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen closed FLINK-2121.
---

 FileInputFormat.addFilesInDir miscalculates total size
 --

 Key: FLINK-2121
 URL: https://issues.apache.org/jira/browse/FLINK-2121
 Project: Flink
  Issue Type: Bug
  Components: Core
Reporter: Gabor Gevay
Assignee: Gabor Gevay
Priority: Minor
 Fix For: 0.9


 In FileInputFormat.addFilesInDir, the length variable should start from 0, 
 because the return value is always used by adding it to the length (instead 
 of just assigning). So with the current version, the length before the call 
 will be seen twice in the result.
 mvn verify caught this for me now. The reason why this hasn't been seen yet, 
 is because testGetStatisticsMultipleNestedFiles catches this only if it gets 
 the listings of the outer directory in a certain order. Concretely, if the 
 inner directory is seen before the other file in the outer directory, then 
 length is 0 at that point, so the bug doesn't show. But if the other file is 
 seen first, then its size is added twice to the total result.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-2121) FileInputFormat.addFilesInDir miscalculates total size

2015-06-01 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-2121.
-
   Resolution: Fixed
Fix Version/s: 0.9

Fixed via 033409190235f93ed6d4e652214e7f35a34c3fe3

Thank you for the patch!

 FileInputFormat.addFilesInDir miscalculates total size
 --

 Key: FLINK-2121
 URL: https://issues.apache.org/jira/browse/FLINK-2121
 Project: Flink
  Issue Type: Bug
  Components: Core
Reporter: Gabor Gevay
Assignee: Gabor Gevay
Priority: Minor
 Fix For: 0.9


 In FileInputFormat.addFilesInDir, the length variable should start from 0, 
 because the return value is always used by adding it to the length (instead 
 of just assigning). So with the current version, the length before the call 
 will be seen twice in the result.
 mvn verify caught this for me now. The reason why this hasn't been seen yet, 
 is because testGetStatisticsMultipleNestedFiles catches this only if it gets 
 the listings of the outer directory in a certain order. Concretely, if the 
 inner directory is seen before the other file in the outer directory, then 
 length is 0 at that point, so the bug doesn't show. But if the other file is 
 seen first, then its size is added twice to the total result.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2098) Checkpoint barrier initiation at source is not aligned with snapshotting

2015-06-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14567498#comment-14567498
 ] 

ASF GitHub Bot commented on FLINK-2098:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/742#issuecomment-107613541
  
I'll make a pass


 Checkpoint barrier initiation at source is not aligned with snapshotting
 

 Key: FLINK-2098
 URL: https://issues.apache.org/jira/browse/FLINK-2098
 Project: Flink
  Issue Type: Bug
  Components: Streaming
Affects Versions: 0.9
Reporter: Stephan Ewen
Assignee: Aljoscha Krettek
Priority: Blocker
 Fix For: 0.9


 The stream source does not properly align the emission of checkpoint barriers 
 with the drawing of snapshots.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [WIP] [FLINK-1993] [ml] - Replace MultipleLine...

2015-06-01 Thread thvasilo
Github user thvasilo commented on the pull request:

https://github.com/apache/flink/pull/725#issuecomment-107563713
  
Closing as it gets super-seeded by the optimization framework 
[refactoring](https://github.com/apache/flink/pull/740)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [WIP] [FLINK-1993] [ml] - Replace MultipleLine...

2015-06-01 Thread thvasilo
Github user thvasilo closed the pull request at:

https://github.com/apache/flink/pull/725


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1430] [streaming] Scala API completenes...

2015-06-01 Thread mbalassi
GitHub user mbalassi opened a pull request:

https://github.com/apache/flink/pull/753

[FLINK-1430] [streaming] Scala API completeness for streaming

Added the necessary checks and missing methods for streaming. 

Created an abstract base class for these completeness tests to have the 
base functionality in a central place.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mbalassi/flink flink-1430

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/753.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #753


commit 50c818d6d28c33546546812c78db7f47c1a32447
Author: mbalassi mbala...@apache.org
Date:   2015-06-01T14:56:15Z

[FLINK-1430] [streaming] Scala API completeness for streaming




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Closed] (FLINK-2004) Memory leak in presence of failed checkpoints in KafkaSource

2015-06-01 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen closed FLINK-2004.
---

 Memory leak in presence of failed checkpoints in KafkaSource
 

 Key: FLINK-2004
 URL: https://issues.apache.org/jira/browse/FLINK-2004
 Project: Flink
  Issue Type: Bug
  Components: Streaming
Affects Versions: 0.9
Reporter: Stephan Ewen
Assignee: Robert Metzger
Priority: Critical
 Fix For: 0.9


 Checkpoints that fail never send a commit message to the tasks.
 Maintaining a map of all pending checkpoints introduces a memory leak, as 
 entries for failed checkpoints will never be removed.
 Approaches to fix this:
   - The source cleans up entries from older checkpoints once a checkpoint is 
 committed (simple implementation in a linked hash map)
   - The commit message could include the optional state handle (source needs 
 not maintain the map)
   - The checkpoint coordinator could send messages for failed checkpoints?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2004) Memory leak in presence of failed checkpoints in KafkaSource

2015-06-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14567475#comment-14567475
 ] 

ASF GitHub Bot commented on FLINK-2004:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/674


 Memory leak in presence of failed checkpoints in KafkaSource
 

 Key: FLINK-2004
 URL: https://issues.apache.org/jira/browse/FLINK-2004
 Project: Flink
  Issue Type: Bug
  Components: Streaming
Affects Versions: 0.9
Reporter: Stephan Ewen
Assignee: Robert Metzger
Priority: Critical
 Fix For: 0.9


 Checkpoints that fail never send a commit message to the tasks.
 Maintaining a map of all pending checkpoints introduces a memory leak, as 
 entries for failed checkpoints will never be removed.
 Approaches to fix this:
   - The source cleans up entries from older checkpoints once a checkpoint is 
 committed (simple implementation in a linked hash map)
   - The commit message could include the optional state handle (source needs 
 not maintain the map)
   - The checkpoint coordinator could send messages for failed checkpoints?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2004] Fix memory leak in presence of fa...

2015-06-01 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/674


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2127) The GSA Documentation has trailing /p s

2015-06-01 Thread Andra Lungu (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14567378#comment-14567378
 ] 

Andra Lungu commented on FLINK-2127:


Hi Ufuk, 

Nope, that's not what I meant :)
If you look at: 
http://ci.apache.org/projects/flink/flink-docs-master/libs/gelly_guide.html#vertex-centric-iterations
 , the image for the second superstep speaks for itself!


 The GSA Documentation has trailing /p s
 -

 Key: FLINK-2127
 URL: https://issues.apache.org/jira/browse/FLINK-2127
 Project: Flink
  Issue Type: Bug
  Components: Documentation, Gelly
Affects Versions: 0.9
Reporter: Andra Lungu
Priority: Minor

 Within the GSA Section of the documentation, there are trailing: p 
 class=text-center image /p. 
 It would be nice to remove them :) 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1319) Add static code analysis for UDFs

2015-06-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14567304#comment-14567304
 ] 

ASF GitHub Bot commented on FLINK-1319:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/729#issuecomment-107482998
  
The concern about logging is that, when using the local mode inside the 
IDE, the system logs a lot and the hints get lost.

If you don't want sysoutput, you could always deactivate the analysis.


 Add static code analysis for UDFs
 -

 Key: FLINK-1319
 URL: https://issues.apache.org/jira/browse/FLINK-1319
 Project: Flink
  Issue Type: New Feature
  Components: Java API, Scala API
Reporter: Stephan Ewen
Assignee: Timo Walther
Priority: Minor

 Flink's Optimizer takes information that tells it for UDFs which fields of 
 the input elements are accessed, modified, or frwarded/copied. This 
 information frequently helps to reuse partitionings, sorts, etc. It may speed 
 up programs significantly, as it can frequently eliminate sorts and shuffles, 
 which are costly.
 Right now, users can add lightweight annotations to UDFs to provide this 
 information (such as adding {{@ConstandFields(0-3, 1, 2-1)}}.
 We worked with static code analysis of UDFs before, to determine this 
 information automatically. This is an incredible feature, as it magically 
 makes programs faster.
 For record-at-a-time operations (Map, Reduce, FlatMap, Join, Cross), this 
 works surprisingly well in many cases. We used the Soot toolkit for the 
 static code analysis. Unfortunately, Soot is LGPL licensed and thus we did 
 not include any of the code so far.
 I propose to add this functionality to Flink, in the form of a drop-in 
 addition, to work around the LGPL incompatibility with ALS 2.0. Users could 
 simply download a special flink-code-analysis.jar and drop it into the 
 lib folder to enable this functionality. We may even add a script to 
 tools that downloads that library automatically into the lib folder. This 
 should be legally fine, since we do not redistribute LGPL code and only 
 dynamically link it (the incompatibility with ASL 2.0 is mainly in the 
 patentability, if I remember correctly).
 Prior work on this has been done by [~aljoscha] and [~skunert], which could 
 provide a code base to start with.
 *Appendix*
 Hompage to Soot static analysis toolkit: http://www.sable.mcgill.ca/soot/
 Papers on static analysis and for optimization: 
 http://stratosphere.eu/assets/papers/EnablingOperatorReorderingSCA_12.pdf and 
 http://stratosphere.eu/assets/papers/openingTheBlackBoxes_12.pdf
 Quick introduction to the Optimizer: 
 http://stratosphere.eu/assets/papers/2014-VLDBJ_Stratosphere_Overview.pdf 
 (Section 6)
 Optimizer for Iterations: 
 http://stratosphere.eu/assets/papers/spinningFastIterativeDataFlows_12.pdf 
 (Sections 4.3 and 5.3)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1319][core] Add static code analysis fo...

2015-06-01 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/729#issuecomment-107482998
  
The concern about logging is that, when using the local mode inside the 
IDE, the system logs a lot and the hints get lost.

If you don't want sysoutput, you could always deactivate the analysis.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Resolved] (FLINK-2114) PunctuationPolicy.toString() throws NullPointerException if extractor is null

2015-06-01 Thread Gabor Gevay (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabor Gevay resolved FLINK-2114.

Resolution: Fixed

 PunctuationPolicy.toString() throws NullPointerException if extractor is null
 -

 Key: FLINK-2114
 URL: https://issues.apache.org/jira/browse/FLINK-2114
 Project: Flink
  Issue Type: Bug
  Components: Streaming
Reporter: Gabor Gevay
Assignee: Gabor Gevay
Priority: Minor

 Parenthesis is missing in PunctuationPolicy.toString() around the conditional 
 operator checking for not null, which makes the condition always true.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2076] [runtime] Fix memory leakage in M...

2015-06-01 Thread chiwanpark
Github user chiwanpark commented on the pull request:

https://github.com/apache/flink/pull/751#issuecomment-107478713
  
I fixed the bug related to non-null memory segment. The bug was caused by 
non-clearing `writeBehindBuffersAvailable` variable in `close()` method of 
`MutableHashTable` class. I added a test case for this bug.

I have a problem for creating a test case to test the bug related to null 
memory segment. I think that it is related to parallelism also. If I run 
ConnectedComponents example with parallelism 1, the bug not occurs. So I cannot 
reproduce the state without ConnectedComponents example. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2111) Add terminate signal to cleanly stop streaming jobs

2015-06-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14567386#comment-14567386
 ] 

ASF GitHub Bot commented on FLINK-2111:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/750#issuecomment-107549742
  
Good idea!

Given the fact that we are stabilizing at the moment (and this is 
introducing new functionality), I vote to postpone that to after the 0.9 
release. That would be two weeks or so, if things go well.

Until then, here are a few thoughts

  - Terminate sounds like a very non-graceful killing. What this does is 
the opposite, so how about calling it stop signal.
 - We will need a way to stop streaming jobs and perform a checkpoint 
during the stopping. Would be good to bear this in mind. Can we add this as 
part of this pull request?


 Add terminate signal to cleanly stop streaming jobs
 -

 Key: FLINK-2111
 URL: https://issues.apache.org/jira/browse/FLINK-2111
 Project: Flink
  Issue Type: Improvement
  Components: Distributed Runtime, JobManager, Local Runtime, 
 Streaming, TaskManager, Webfrontend
Reporter: Matthias J. Sax
Assignee: Matthias J. Sax
Priority: Minor

 Currently, streaming jobs can only be stopped using cancel command, what is 
 a hard stop with no clean shutdown.
 The new introduced terminate signal, will only affect streaming source 
 tasks such that the sources can stop emitting data and terminate cleanly, 
 resulting in a clean termination of the whole streaming job.
 This feature is a pre-requirment for 
 https://issues.apache.org/jira/browse/FLINK-1929



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2111] Add terminate signal to cleanly...

2015-06-01 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/750#issuecomment-107549742
  
Good idea!

Given the fact that we are stabilizing at the moment (and this is 
introducing new functionality), I vote to postpone that to after the 0.9 
release. That would be two weeks or so, if things go well.

Until then, here are a few thoughts

  - Terminate sounds like a very non-graceful killing. What this does is 
the opposite, so how about calling it stop signal.
 - We will need a way to stop streaming jobs and perform a checkpoint 
during the stopping. Would be good to bear this in mind. Can we add this as 
part of this pull request?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (FLINK-2127) The GSA Documentation has trailing /p s

2015-06-01 Thread Andra Lungu (JIRA)
Andra Lungu created FLINK-2127:
--

 Summary: The GSA Documentation has trailing /p s
 Key: FLINK-2127
 URL: https://issues.apache.org/jira/browse/FLINK-2127
 Project: Flink
  Issue Type: Bug
  Components: Gelly
Affects Versions: 0.9
Reporter: Andra Lungu
Priority: Minor


Within the GSA Section of the documentation, there are trailing: p 
class=text-center image /p. 

It would be nice to remove them :) 





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-2127) The GSA Documentation has trailing /p s

2015-06-01 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi updated FLINK-2127:
---
Component/s: Documentation

 The GSA Documentation has trailing /p s
 -

 Key: FLINK-2127
 URL: https://issues.apache.org/jira/browse/FLINK-2127
 Project: Flink
  Issue Type: Bug
  Components: Documentation, Gelly
Affects Versions: 0.9
Reporter: Andra Lungu
Priority: Minor

 Within the GSA Section of the documentation, there are trailing: p 
 class=text-center image /p. 
 It would be nice to remove them :) 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2076) Bug in re-openable hash join

2015-06-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14567296#comment-14567296
 ] 

ASF GitHub Bot commented on FLINK-2076:
---

Github user chiwanpark commented on the pull request:

https://github.com/apache/flink/pull/751#issuecomment-107478713
  
I fixed the bug related to non-null memory segment. The bug was caused by 
non-clearing `writeBehindBuffersAvailable` variable in `close()` method of 
`MutableHashTable` class. I added a test case for this bug.

I have a problem for creating a test case to test the bug related to null 
memory segment. I think that it is related to parallelism also. If I run 
ConnectedComponents example with parallelism 1, the bug not occurs. So I cannot 
reproduce the state without ConnectedComponents example. 


 Bug in re-openable hash join
 

 Key: FLINK-2076
 URL: https://issues.apache.org/jira/browse/FLINK-2076
 Project: Flink
  Issue Type: Bug
  Components: Local Runtime
Affects Versions: 0.9
Reporter: Stephan Ewen
Assignee: Chiwan Park

 It happens deterministically in my machine with the following setup:
 TaskManager:
   - heap size: 512m
   - network buffers: 4096
   - slots: 32
 Job:
   - ConnectedComponents
   - 100k vertices
   - 1.2m edges
 -- this gives around 260 m Flink managed memory, across 32 slots is 8MB per 
 slot, with several mem consumers in the job, makes the iterative hash join 
 out-of-core
 {code}
 java.lang.RuntimeException: Hash Join bug in memory management: 
 Memory buffers leaked.
   at 
 org.apache.flink.runtime.operators.hash.MutableHashTable.buildTableFromSpilledPartition(MutableHashTable.java:733)
   at 
 org.apache.flink.runtime.operators.hash.MutableHashTable.prepareNextPartition(MutableHashTable.java:508)
   at 
 org.apache.flink.runtime.operators.hash.ReOpenableMutableHashTable.prepareNextPartition(ReOpenableMutableHashTable.java:167)
   at 
 org.apache.flink.runtime.operators.hash.MutableHashTable.nextRecord(MutableHashTable.java:541)
   at 
 org.apache.flink.runtime.operators.hash.NonReusingBuildSecondHashMatchIterator.callWithNextKey(NonReusingBuildSecondHashMatchIterator.java:102)
   at 
 org.apache.flink.runtime.operators.AbstractCachedBuildSideMatchDriver.run(AbstractCachedBuildSideMatchDriver.java:155)
   at 
 org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:496)
   at 
 org.apache.flink.runtime.iterative.task.AbstractIterativePactTask.run(AbstractIterativePactTask.java:139)
   at 
 org.apache.flink.runtime.iterative.task.IterationIntermediatePactTask.run(IterationIntermediatePactTask.java:92)
   at 
 org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362)
   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:560)
   at java.lang.Thread.run(Thread.java:745)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2120) Rename AbstractJobVertex to JobVertex

2015-06-01 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14567329#comment-14567329
 ] 

Stephan Ewen commented on FLINK-2120:
-

+1

 Rename AbstractJobVertex to JobVertex
 -

 Key: FLINK-2120
 URL: https://issues.apache.org/jira/browse/FLINK-2120
 Project: Flink
  Issue Type: Wish
  Components: Distributed Runtime
Affects Versions: master
Reporter: Ufuk Celebi
Priority: Trivial

 I would like to rename AbstractJobVertex to JobVertex. It is not abstract and 
 we have a lot of references to it in tests, where we create instances. This 
 is trivial, but I think it is a bad name.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2121) FileInputFormat.addFilesInDir miscalculates total size

2015-06-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14567328#comment-14567328
 ] 

ASF GitHub Bot commented on FLINK-2121:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/752#issuecomment-107498384
  
+1, should go into the release candidate.

Will merge this now...


 FileInputFormat.addFilesInDir miscalculates total size
 --

 Key: FLINK-2121
 URL: https://issues.apache.org/jira/browse/FLINK-2121
 Project: Flink
  Issue Type: Bug
  Components: Core
Reporter: Gabor Gevay
Assignee: Gabor Gevay
Priority: Minor

 In FileInputFormat.addFilesInDir, the length variable should start from 0, 
 because the return value is always used by adding it to the length (instead 
 of just assigning). So with the current version, the length before the call 
 will be seen twice in the result.
 mvn verify caught this for me now. The reason why this hasn't been seen yet, 
 is because testGetStatisticsMultipleNestedFiles catches this only if it gets 
 the listings of the outer directory in a certain order. Concretely, if the 
 inner directory is seen before the other file in the outer directory, then 
 length is 0 at that point, so the bug doesn't show. But if the other file is 
 seen first, then its size is added twice to the total result.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2121] Fix the summation in FileInputFor...

2015-06-01 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/752#issuecomment-107498384
  
+1, should go into the release candidate.

Will merge this now...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: Implemented TwitterSourceFilter and adapted Tw...

2015-06-01 Thread HilmiYildirim
Github user HilmiYildirim commented on the pull request:

https://github.com/apache/flink/pull/695#issuecomment-107551663
  
I think it is good.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2127) The GSA Documentation has trailing /p s

2015-06-01 Thread Andra Lungu (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14567661#comment-14567661
 ] 

Andra Lungu commented on FLINK-2127:


Yep, building it locally works :). Nevertheless, the users see the online 
documentation, so I would still like to keep this open as a bug. The problem 
seems to be the CI server, yes. Can someone help me with more information? I'm 
sure we can find a way to fix this. 

 The GSA Documentation has trailing /p s
 -

 Key: FLINK-2127
 URL: https://issues.apache.org/jira/browse/FLINK-2127
 Project: Flink
  Issue Type: Bug
  Components: Documentation, Gelly
Affects Versions: 0.9
Reporter: Andra Lungu
Priority: Minor

 Within the GSA Section of the documentation, there are trailing: p 
 class=text-center image /p. 
 It would be nice to remove them :) 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1727) Add decision tree to machine learning library

2015-06-01 Thread Sachin Goel (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14567676#comment-14567676
 ] 

Sachin Goel commented on FLINK-1727:


A complete implementation, with both Gini gain and entropy gain, along with 
Categorical and continuous attributes is present here: 
https://github.com/apache/flink/pull/710

Two example data sets are also included in the test suite.

 Add decision tree to machine learning library
 -

 Key: FLINK-1727
 URL: https://issues.apache.org/jira/browse/FLINK-1727
 Project: Flink
  Issue Type: New Feature
  Components: Machine Learning Library
Reporter: Till Rohrmann
Assignee: Sachin Goel
  Labels: ML

 Decision trees are widely used for classification and regression tasks. Thus, 
 it would be worthwhile to add support for them to Flink's machine learning 
 library. 
 A streaming parallel decision tree learning algorithm has been proposed by 
 Ben-Haim and Tom-Tov [1]. This can maybe adapted to a batch use case as well. 
 [2] contains an overview of different techniques of how to scale inductive 
 learning algorithms up. A presentation of Spark's MLlib decision tree 
 implementation can be found in [3].
 Resources:
 [1] [http://www.jmlr.org/papers/volume11/ben-haim10a/ben-haim10a.pdf]
 [2] 
 [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.46.8226rep=rep1type=pdf]
 [3] 
 [http://spark-summit.org/wp-content/uploads/2014/07/Scalable-Distributed-Decision-Trees-in-Spark-Made-Das-Sparks-Talwalkar.pdf]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2098] Ensure checkpoints and element em...

2015-06-01 Thread mbalassi
Github user mbalassi commented on a diff in the pull request:

https://github.com/apache/flink/pull/742#discussion_r31461903
  
--- Diff: 
flink-staging/flink-streaming/flink-streaming-connectors/flink-connector-twitter/src/main/java/org/apache/flink/streaming/connectors/twitter/TwitterSource.java
 ---
@@ -1,322 +1,322 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one or more
- * contributor license agreements.  See the NOTICE file distributed with
- * this work for additional information regarding copyright ownership.
- * The ASF licenses this file to You under the Apache License, Version 2.0
- * (the License); you may not use this file except in compliance with
- * the License.  You may obtain a copy of the License at
- *
- *http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an AS IS BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-package org.apache.flink.streaming.connectors.twitter;
-
-import java.io.FileInputStream;
-import java.io.InputStream;
-import java.util.Properties;
-import java.util.concurrent.BlockingQueue;
-import java.util.concurrent.LinkedBlockingQueue;
-import java.util.concurrent.TimeUnit;
-
-import org.apache.flink.configuration.Configuration;
-import 
org.apache.flink.streaming.api.functions.source.RichParallelSourceFunction;
-import org.apache.flink.streaming.api.functions.source.SourceFunction;
-import org.apache.flink.util.Collector;
-import org.slf4j.Logger;
-import org.slf4j.LoggerFactory;
-
-import com.twitter.hbc.ClientBuilder;
-import com.twitter.hbc.core.Constants;
-import com.twitter.hbc.core.endpoint.StatusesSampleEndpoint;
-import com.twitter.hbc.core.processor.StringDelimitedProcessor;
-import com.twitter.hbc.httpclient.BasicClient;
-import com.twitter.hbc.httpclient.auth.Authentication;
-import com.twitter.hbc.httpclient.auth.OAuth1;
-
-/**
- * Implementation of {@link SourceFunction} specialized to emit tweets from
- * Twitter. It can connect to Twitter Streaming API, collect tweets and
- */
-public class TwitterSource extends RichParallelSourceFunctionString {
-
-   private static final Logger LOG = 
LoggerFactory.getLogger(TwitterSource.class);
-
-   private static final long serialVersionUID = 1L;
-   private String authPath;
-   private transient BlockingQueueString queue;
-   private int queueSize = 1;
-   private transient BasicClient client;
-   private int waitSec = 5;
-
-   private int maxNumberOfTweets;
-   private int currentNumberOfTweets;
-
-   private String nextElement = null;
-
-   private volatile boolean isRunning = false;
-
-   /**
-* Create {@link TwitterSource} for streaming
-* 
-* @param authPath
-*Location of the properties file containing the required
-*authentication information.
-*/
-   public TwitterSource(String authPath) {
-   this.authPath = authPath;
-   maxNumberOfTweets = -1;
-   }
-
-   /**
-* Create {@link TwitterSource} to collect finite number of tweets
-* 
-* @param authPath
-*Location of the properties file containing the required
-*authentication information.
-* @param numberOfTweets
-* 
-*/
-   public TwitterSource(String authPath, int numberOfTweets) {
-   this.authPath = authPath;
-   this.maxNumberOfTweets = numberOfTweets;
-   }
-
-   @Override
-   public void open(Configuration parameters) throws Exception {
-   initializeConnection();
-   currentNumberOfTweets = 0;
-   }
-
-   /**
-* Initialize Hosebird Client to be able to consume Twitter's Streaming 
API
-*/
-   private void initializeConnection() {
-
-   if (LOG.isInfoEnabled()) {
-   LOG.info(Initializing Twitter Streaming API 
connection);
-   }
-
-   queue = new LinkedBlockingQueueString(queueSize);
-
-   StatusesSampleEndpoint endpoint = new StatusesSampleEndpoint();
-   endpoint.stallWarnings(false);
-
-   Authentication auth = authenticate();
-
-   initializeClient(endpoint, auth);
-
-   if (LOG.isInfoEnabled()) {
-   LOG.info(Twitter Streaming API connection established 
successfully);
-   }
-   }
-
-   private OAuth1 authenticate() 

[jira] [Commented] (FLINK-2098) Checkpoint barrier initiation at source is not aligned with snapshotting

2015-06-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14567853#comment-14567853
 ] 

ASF GitHub Bot commented on FLINK-2098:
---

Github user mbalassi commented on a diff in the pull request:

https://github.com/apache/flink/pull/742#discussion_r31460859
  
--- Diff: 
flink-staging/flink-streaming/flink-streaming-connectors/flink-connector-kafka/src/main/java/org/apache/flink/streaming/connectors/kafka/api/persistent/PersistentKafkaSource.java
 ---
@@ -61,10 +63,14 @@
  *
  * Note that the autocommit feature of Kafka needs to be disabled for 
using this source.
  */
-public class PersistentKafkaSourceOUT extends 
RichParallelSourceFunctionOUT implements
+public class PersistentKafkaSourceOUT extends RichSourceFunctionOUT 
implements
ResultTypeQueryableOUT,
CheckpointCommitter,
+   ParallelSourceFunctionOUT,
CheckpointedAsynchronouslylong[] {
--- End diff --

Please have a RichParallelSourceFunction instead.


 Checkpoint barrier initiation at source is not aligned with snapshotting
 

 Key: FLINK-2098
 URL: https://issues.apache.org/jira/browse/FLINK-2098
 Project: Flink
  Issue Type: Bug
  Components: Streaming
Affects Versions: 0.9
Reporter: Stephan Ewen
Assignee: Aljoscha Krettek
Priority: Blocker
 Fix For: 0.9


 The stream source does not properly align the emission of checkpoint barriers 
 with the drawing of snapshots.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2098] Ensure checkpoints and element em...

2015-06-01 Thread mbalassi
Github user mbalassi commented on a diff in the pull request:

https://github.com/apache/flink/pull/742#discussion_r31460859
  
--- Diff: 
flink-staging/flink-streaming/flink-streaming-connectors/flink-connector-kafka/src/main/java/org/apache/flink/streaming/connectors/kafka/api/persistent/PersistentKafkaSource.java
 ---
@@ -61,10 +63,14 @@
  *
  * Note that the autocommit feature of Kafka needs to be disabled for 
using this source.
  */
-public class PersistentKafkaSourceOUT extends 
RichParallelSourceFunctionOUT implements
+public class PersistentKafkaSourceOUT extends RichSourceFunctionOUT 
implements
ResultTypeQueryableOUT,
CheckpointCommitter,
+   ParallelSourceFunctionOUT,
CheckpointedAsynchronouslylong[] {
--- End diff --

Please have a RichParallelSourceFunction instead.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2098) Checkpoint barrier initiation at source is not aligned with snapshotting

2015-06-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14567868#comment-14567868
 ] 

ASF GitHub Bot commented on FLINK-2098:
---

Github user mbalassi commented on a diff in the pull request:

https://github.com/apache/flink/pull/742#discussion_r31461903
  
--- Diff: 
flink-staging/flink-streaming/flink-streaming-connectors/flink-connector-twitter/src/main/java/org/apache/flink/streaming/connectors/twitter/TwitterSource.java
 ---
@@ -1,322 +1,322 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one or more
- * contributor license agreements.  See the NOTICE file distributed with
- * this work for additional information regarding copyright ownership.
- * The ASF licenses this file to You under the Apache License, Version 2.0
- * (the License); you may not use this file except in compliance with
- * the License.  You may obtain a copy of the License at
- *
- *http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an AS IS BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-package org.apache.flink.streaming.connectors.twitter;
-
-import java.io.FileInputStream;
-import java.io.InputStream;
-import java.util.Properties;
-import java.util.concurrent.BlockingQueue;
-import java.util.concurrent.LinkedBlockingQueue;
-import java.util.concurrent.TimeUnit;
-
-import org.apache.flink.configuration.Configuration;
-import 
org.apache.flink.streaming.api.functions.source.RichParallelSourceFunction;
-import org.apache.flink.streaming.api.functions.source.SourceFunction;
-import org.apache.flink.util.Collector;
-import org.slf4j.Logger;
-import org.slf4j.LoggerFactory;
-
-import com.twitter.hbc.ClientBuilder;
-import com.twitter.hbc.core.Constants;
-import com.twitter.hbc.core.endpoint.StatusesSampleEndpoint;
-import com.twitter.hbc.core.processor.StringDelimitedProcessor;
-import com.twitter.hbc.httpclient.BasicClient;
-import com.twitter.hbc.httpclient.auth.Authentication;
-import com.twitter.hbc.httpclient.auth.OAuth1;
-
-/**
- * Implementation of {@link SourceFunction} specialized to emit tweets from
- * Twitter. It can connect to Twitter Streaming API, collect tweets and
- */
-public class TwitterSource extends RichParallelSourceFunctionString {
-
-   private static final Logger LOG = 
LoggerFactory.getLogger(TwitterSource.class);
-
-   private static final long serialVersionUID = 1L;
-   private String authPath;
-   private transient BlockingQueueString queue;
-   private int queueSize = 1;
-   private transient BasicClient client;
-   private int waitSec = 5;
-
-   private int maxNumberOfTweets;
-   private int currentNumberOfTweets;
-
-   private String nextElement = null;
-
-   private volatile boolean isRunning = false;
-
-   /**
-* Create {@link TwitterSource} for streaming
-* 
-* @param authPath
-*Location of the properties file containing the required
-*authentication information.
-*/
-   public TwitterSource(String authPath) {
-   this.authPath = authPath;
-   maxNumberOfTweets = -1;
-   }
-
-   /**
-* Create {@link TwitterSource} to collect finite number of tweets
-* 
-* @param authPath
-*Location of the properties file containing the required
-*authentication information.
-* @param numberOfTweets
-* 
-*/
-   public TwitterSource(String authPath, int numberOfTweets) {
-   this.authPath = authPath;
-   this.maxNumberOfTweets = numberOfTweets;
-   }
-
-   @Override
-   public void open(Configuration parameters) throws Exception {
-   initializeConnection();
-   currentNumberOfTweets = 0;
-   }
-
-   /**
-* Initialize Hosebird Client to be able to consume Twitter's Streaming 
API
-*/
-   private void initializeConnection() {
-
-   if (LOG.isInfoEnabled()) {
-   LOG.info(Initializing Twitter Streaming API 
connection);
-   }
-
-   queue = new LinkedBlockingQueueString(queueSize);
-
-   StatusesSampleEndpoint endpoint = new StatusesSampleEndpoint();
-   endpoint.stallWarnings(false);
-
-   Authentication auth = authenticate();
-
- 

[jira] [Commented] (FLINK-2098) Checkpoint barrier initiation at source is not aligned with snapshotting

2015-06-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14567903#comment-14567903
 ] 

ASF GitHub Bot commented on FLINK-2098:
---

Github user aljoscha commented on a diff in the pull request:

https://github.com/apache/flink/pull/742#discussion_r31463699
  
--- Diff: 
flink-staging/flink-streaming/flink-streaming-connectors/flink-connector-twitter/src/main/java/org/apache/flink/streaming/connectors/twitter/TwitterSource.java
 ---
@@ -1,322 +1,322 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one or more
- * contributor license agreements.  See the NOTICE file distributed with
- * this work for additional information regarding copyright ownership.
- * The ASF licenses this file to You under the Apache License, Version 2.0
- * (the License); you may not use this file except in compliance with
- * the License.  You may obtain a copy of the License at
- *
- *http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an AS IS BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-package org.apache.flink.streaming.connectors.twitter;
-
-import java.io.FileInputStream;
-import java.io.InputStream;
-import java.util.Properties;
-import java.util.concurrent.BlockingQueue;
-import java.util.concurrent.LinkedBlockingQueue;
-import java.util.concurrent.TimeUnit;
-
-import org.apache.flink.configuration.Configuration;
-import 
org.apache.flink.streaming.api.functions.source.RichParallelSourceFunction;
-import org.apache.flink.streaming.api.functions.source.SourceFunction;
-import org.apache.flink.util.Collector;
-import org.slf4j.Logger;
-import org.slf4j.LoggerFactory;
-
-import com.twitter.hbc.ClientBuilder;
-import com.twitter.hbc.core.Constants;
-import com.twitter.hbc.core.endpoint.StatusesSampleEndpoint;
-import com.twitter.hbc.core.processor.StringDelimitedProcessor;
-import com.twitter.hbc.httpclient.BasicClient;
-import com.twitter.hbc.httpclient.auth.Authentication;
-import com.twitter.hbc.httpclient.auth.OAuth1;
-
-/**
- * Implementation of {@link SourceFunction} specialized to emit tweets from
- * Twitter. It can connect to Twitter Streaming API, collect tweets and
- */
-public class TwitterSource extends RichParallelSourceFunctionString {
-
-   private static final Logger LOG = 
LoggerFactory.getLogger(TwitterSource.class);
-
-   private static final long serialVersionUID = 1L;
-   private String authPath;
-   private transient BlockingQueueString queue;
-   private int queueSize = 1;
-   private transient BasicClient client;
-   private int waitSec = 5;
-
-   private int maxNumberOfTweets;
-   private int currentNumberOfTweets;
-
-   private String nextElement = null;
-
-   private volatile boolean isRunning = false;
-
-   /**
-* Create {@link TwitterSource} for streaming
-* 
-* @param authPath
-*Location of the properties file containing the required
-*authentication information.
-*/
-   public TwitterSource(String authPath) {
-   this.authPath = authPath;
-   maxNumberOfTweets = -1;
-   }
-
-   /**
-* Create {@link TwitterSource} to collect finite number of tweets
-* 
-* @param authPath
-*Location of the properties file containing the required
-*authentication information.
-* @param numberOfTweets
-* 
-*/
-   public TwitterSource(String authPath, int numberOfTweets) {
-   this.authPath = authPath;
-   this.maxNumberOfTweets = numberOfTweets;
-   }
-
-   @Override
-   public void open(Configuration parameters) throws Exception {
-   initializeConnection();
-   currentNumberOfTweets = 0;
-   }
-
-   /**
-* Initialize Hosebird Client to be able to consume Twitter's Streaming 
API
-*/
-   private void initializeConnection() {
-
-   if (LOG.isInfoEnabled()) {
-   LOG.info(Initializing Twitter Streaming API 
connection);
-   }
-
-   queue = new LinkedBlockingQueueString(queueSize);
-
-   StatusesSampleEndpoint endpoint = new StatusesSampleEndpoint();
-   endpoint.stallWarnings(false);
-
-   Authentication auth = authenticate();
-
- 

[jira] [Commented] (FLINK-2101) Scheme Inference doesn't work for Tuple5

2015-06-01 Thread Robert Metzger (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14567813#comment-14567813
 ] 

Robert Metzger commented on FLINK-2101:
---

Yes, will do.

 Scheme Inference doesn't work for Tuple5
 

 Key: FLINK-2101
 URL: https://issues.apache.org/jira/browse/FLINK-2101
 Project: Flink
  Issue Type: Bug
  Components: Kafka Connector
Affects Versions: master
Reporter: Rico Bergmann
Assignee: Robert Metzger

 Calling addSink(new KafkaSinkTuple5String, String, String, Long, Double(
   localhost:9092,
   webtrends.ec1601,
   new 
 Utils.TypeInformationSerializationSchemaTuple5String, String, String, Long, 
 Double(
   new 
 Tuple5String, String, String, Long, Double(),
   
 env.getConfig(;
 gives me an Exception stating, that the generic type infos are not given.
 Exception in thread main 
 org.apache.flink.api.common.functions.InvalidTypesException: Tuple needs to 
 be parameterized by using generics.
   at 
 org.apache.flink.api.java.typeutils.TypeExtractor.createTypeInfoWithTypeHierarchy(TypeExtractor.java:396)
   at 
 org.apache.flink.api.java.typeutils.TypeExtractor.privateCreateTypeInfo(TypeExtractor.java:339)
   at 
 org.apache.flink.api.java.typeutils.TypeExtractor.createTypeInfo(TypeExtractor.java:318)
   at 
 org.apache.flink.api.java.typeutils.ObjectArrayTypeInfo.init(ObjectArrayTypeInfo.java:45)
   at 
 org.apache.flink.api.java.typeutils.ObjectArrayTypeInfo.getInfoFor(ObjectArrayTypeInfo.java:167)
   at 
 org.apache.flink.api.java.typeutils.TypeExtractor.privateGetForClass(TypeExtractor.java:1150)
   at 
 org.apache.flink.api.java.typeutils.TypeExtractor.privateGetForClass(TypeExtractor.java:1122)
   at 
 org.apache.flink.api.java.typeutils.TypeExtractor.privateGetForObject(TypeExtractor.java:1476)
   at 
 org.apache.flink.api.java.typeutils.TypeExtractor.getForObject(TypeExtractor.java:1446)
   at 
 org.apache.flink.streaming.connectors.kafka.Utils$TypeInformationSerializationSchema.init(Utils.java:37)
   at de.otto.streamexample.WCExample.main(WCExample.java:132)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2098) Checkpoint barrier initiation at source is not aligned with snapshotting

2015-06-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14567848#comment-14567848
 ] 

ASF GitHub Bot commented on FLINK-2098:
---

Github user mbalassi commented on a diff in the pull request:

https://github.com/apache/flink/pull/742#discussion_r31460503
  
--- Diff: 
flink-staging/flink-streaming/flink-streaming-core/src/main/java/org/apache/flink/streaming/api/functions/source/SourceFunction.java
 ---
@@ -19,66 +19,81 @@
 package org.apache.flink.streaming.api.functions.source;
 
 import org.apache.flink.api.common.functions.Function;
+import org.apache.flink.util.Collector;
 
 import java.io.Serializable;
 
 /**
  * Base interface for all stream data sources in Flink. The contract of a 
stream source
- * is similar to an iterator - it is consumed as in the following pseudo 
code:
--- End diff --

Maybe streaming data sources, instead of stream data sources. Not to 
confuse with file streams are intermediate results.


 Checkpoint barrier initiation at source is not aligned with snapshotting
 

 Key: FLINK-2098
 URL: https://issues.apache.org/jira/browse/FLINK-2098
 Project: Flink
  Issue Type: Bug
  Components: Streaming
Affects Versions: 0.9
Reporter: Stephan Ewen
Assignee: Aljoscha Krettek
Priority: Blocker
 Fix For: 0.9


 The stream source does not properly align the emission of checkpoint barriers 
 with the drawing of snapshots.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2098] Ensure checkpoints and element em...

2015-06-01 Thread mbalassi
Github user mbalassi commented on a diff in the pull request:

https://github.com/apache/flink/pull/742#discussion_r31460503
  
--- Diff: 
flink-staging/flink-streaming/flink-streaming-core/src/main/java/org/apache/flink/streaming/api/functions/source/SourceFunction.java
 ---
@@ -19,66 +19,81 @@
 package org.apache.flink.streaming.api.functions.source;
 
 import org.apache.flink.api.common.functions.Function;
+import org.apache.flink.util.Collector;
 
 import java.io.Serializable;
 
 /**
  * Base interface for all stream data sources in Flink. The contract of a 
stream source
- * is similar to an iterator - it is consumed as in the following pseudo 
code:
--- End diff --

Maybe streaming data sources, instead of stream data sources. Not to 
confuse with file streams are intermediate results.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (FLINK-2128) ScalaShellITSuite failing

2015-06-01 Thread Ufuk Celebi (JIRA)
Ufuk Celebi created FLINK-2128:
--

 Summary: ScalaShellITSuite failing
 Key: FLINK-2128
 URL: https://issues.apache.org/jira/browse/FLINK-2128
 Project: Flink
  Issue Type: Bug
  Components: Scala Shell
Affects Versions: master
Reporter: Ufuk Celebi


https://s3.amazonaws.com/archive.travis-ci.org/jobs/64947781/log.txt

{code}
ScalaShellITSuite:
log4j:ERROR setFile(null,true) call failed.
java.io.FileNotFoundException: /.log (Permission denied)
at java.io.FileOutputStream.open(Native Method)
at java.io.FileOutputStream.init(FileOutputStream.java:221)
at java.io.FileOutputStream.init(FileOutputStream.java:142)
at org.apache.log4j.FileAppender.setFile(FileAppender.java:294)
at org.apache.log4j.FileAppender.activateOptions(FileAppender.java:165)
at 
org.apache.log4j.config.PropertySetter.activate(PropertySetter.java:307)
at 
org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:172)
at 
org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:104)
at 
org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.java:842)
at 
org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.java:768)
at 
org.apache.log4j.PropertyConfigurator.configureRootCategory(PropertyConfigurator.java:648)
at 
org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:514)
at 
org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:580)
at 
org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.java:526)
at org.apache.log4j.LogManager.clinit(LogManager.java:127)
at 
org.slf4j.impl.Log4jLoggerFactory.getLogger(Log4jLoggerFactory.java:66)
at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:277)
at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:288)
at 
org.apache.flink.test.util.TestBaseUtils.clinit(TestBaseUtils.java:69)
at 
org.apache.flink.api.scala.ScalaShellITSuite.beforeAll(ScalaShellITSuite.scala:198)
at 
org.scalatest.BeforeAndAfterAll$class.beforeAll(BeforeAndAfterAll.scala:187)
at 
org.apache.flink.api.scala.ScalaShellITSuite.beforeAll(ScalaShellITSuite.scala:33)
at 
org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:253)
at 
org.apache.flink.api.scala.ScalaShellITSuite.run(ScalaShellITSuite.scala:33)
at org.scalatest.Suite$class.callExecuteOnSuite$1(Suite.scala:1492)
at 
org.scalatest.Suite$$anonfun$runNestedSuites$1.apply(Suite.scala:1528)
at 
org.scalatest.Suite$$anonfun$runNestedSuites$1.apply(Suite.scala:1526)
at 
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
at org.scalatest.Suite$class.runNestedSuites(Suite.scala:1526)
at 
org.scalatest.tools.DiscoverySuite.runNestedSuites(DiscoverySuite.scala:29)
at org.scalatest.Suite$class.run(Suite.scala:1421)
at org.scalatest.tools.DiscoverySuite.run(DiscoverySuite.scala:29)
at org.scalatest.tools.SuiteRunner.run(SuiteRunner.scala:55)
at 
org.scalatest.tools.Runner$$anonfun$doRunRunRunDaDoRunRun$3.apply(Runner.scala:2563)
at 
org.scalatest.tools.Runner$$anonfun$doRunRunRunDaDoRunRun$3.apply(Runner.scala:2557)
at scala.collection.immutable.List.foreach(List.scala:318)
at org.scalatest.tools.Runner$.doRunRunRunDaDoRunRun(Runner.scala:2557)
at 
org.scalatest.tools.Runner$$anonfun$runOptionallyWithPassFailReporter$2.apply(Runner.scala:1044)
at 
org.scalatest.tools.Runner$$anonfun$runOptionallyWithPassFailReporter$2.apply(Runner.scala:1043)
at 
org.scalatest.tools.Runner$.withClassLoaderAndDispatchReporter(Runner.scala:2722)
at 
org.scalatest.tools.Runner$.runOptionallyWithPassFailReporter(Runner.scala:1043)
at org.scalatest.tools.Runner$.main(Runner.scala:860)
at org.scalatest.tools.Runner.main(Runner.scala)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2127) The GSA Documentation has trailing /p s

2015-06-01 Thread Ufuk Celebi (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14567976#comment-14567976
 ] 

Ufuk Celebi commented on FLINK-2127:


This is definitely a bug. I didn't say that we should close it. ;-)

[~mxm], can you provide some info on the CI setup? Which version of Jekyll and 
kramdown is installed? Locally, I'm using kramdown 1.7.0 and jekyll 2.5.3.

 The GSA Documentation has trailing /p s
 -

 Key: FLINK-2127
 URL: https://issues.apache.org/jira/browse/FLINK-2127
 Project: Flink
  Issue Type: Bug
  Components: Documentation, Gelly
Affects Versions: 0.9
Reporter: Andra Lungu
Priority: Minor

 Within the GSA Section of the documentation, there are trailing: p 
 class=text-center image /p. 
 It would be nice to remove them :) 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-2089) Buffer recycled IllegalStateException during cancelling

2015-06-01 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi resolved FLINK-2089.

   Resolution: Fixed
Fix Version/s: 0.9

Fixed in 28eb274.

 Buffer recycled IllegalStateException during cancelling
 -

 Key: FLINK-2089
 URL: https://issues.apache.org/jira/browse/FLINK-2089
 Project: Flink
  Issue Type: Bug
  Components: Distributed Runtime
Affects Versions: master
Reporter: Ufuk Celebi
Assignee: Ufuk Celebi
 Fix For: 0.9


 [~rmetzger] reported the following stack trace during cancelling of high 
 parallelism jobs:
 {code}
 Error: java.lang.IllegalStateException: Buffer has already been recycled.
 at 
 org.apache.flink.shaded.com.google.common.base.Preconditions.checkState(Preconditions.java:173)
 at 
 org.apache.flink.runtime.io.network.buffer.Buffer.ensureNotRecycled(Buffer.java:142)
 at 
 org.apache.flink.runtime.io.network.buffer.Buffer.getMemorySegment(Buffer.java:78)
 at 
 org.apache.flink.runtime.io.network.api.serialization.SpillingAdaptiveSpanningRecordDeserializer.setNextBuffer(SpillingAdaptiveSpanningRecordDeserializer.java:72)
 at 
 org.apache.flink.runtime.io.network.api.reader.AbstractRecordReader.getNextRecord(AbstractRecordReader.java:80)
 at 
 org.apache.flink.runtime.io.network.api.reader.MutableRecordReader.next(MutableRecordReader.java:34)
 at 
 org.apache.flink.runtime.operators.util.ReaderIterator.next(ReaderIterator.java:73)
 at org.apache.flink.runtime.operators.MapDriver.run(MapDriver.java:96)
 at 
 org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:496)
 at 
 org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362)
 at org.apache.flink.runtime.taskmanager.Task.run(Task.java:559)
 at java.lang.Thread.run(Thread.java:745)
 {code}
 This looks like a concurrent buffer pool release/buffer usage error. I'm 
 investing this today.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-1724) TestingCluster uses local communication with multiple task managers

2015-06-01 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi resolved FLINK-1724.

   Resolution: Fixed
Fix Version/s: 0.9

This has been fixed as I've said. The point you've raised is valid and needs a 
separate discussion on the mailing list.

 TestingCluster uses local communication with multiple task managers
 ---

 Key: FLINK-1724
 URL: https://issues.apache.org/jira/browse/FLINK-1724
 Project: Flink
  Issue Type: Bug
Affects Versions: master
Reporter: Ufuk Celebi
Priority: Minor
 Fix For: 0.9


 Starting a task manager via TestingUtils does not respect the number of 
 configured task managers and mis-configures the task managers to use local 
 network communication (LocalConnectionManager instead of 
 NettyConnectionManager).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1747) Remove deadlock detection and pipeline breaker placement in optimizer

2015-06-01 Thread Ufuk Celebi (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14568078#comment-14568078
 ] 

Ufuk Celebi commented on FLINK-1747:


[~StephanEwen], was this a duplicate of FLINK-2041?

 Remove deadlock detection and pipeline breaker placement in optimizer
 -

 Key: FLINK-1747
 URL: https://issues.apache.org/jira/browse/FLINK-1747
 Project: Flink
  Issue Type: Improvement
  Components: Optimizer
Affects Versions: master
Reporter: Ufuk Celebi
Priority: Minor

 The deadlock detection in the optimizer, which places pipeline breaking 
 caches has become redundant with recently added changes. We now use blocking 
 data exchanges for branching programs, which are merged again at a later 
 point.
 Therefore, we can start removing the respective code in the optimizer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2119] Add ExecutionGraph support for ba...

2015-06-01 Thread uce
GitHub user uce opened a pull request:

https://github.com/apache/flink/pull/754

[FLINK-2119] Add ExecutionGraph support for batch scheduling

This PR adds support for a newly introduced scheduling mode 
`BATCH_FROM_SOURCES`. The goal for me was to make this change *minimally 
invasive* in order to not touch too much core code shortly before the release.

Essentially, this only touches two parts of the codebase: the scheduling 
action for blocking results and the job vertices.

If you set the scheduling mode to `BATCH_FROM_SOURCES`, you can manually 
configure which input vertices are used as the sources when scheduling 
(`setAsBatchSource`). You can then manually specify the successor vertices 
(`addBatchSuccessor`), which are scheduled after the blocking results are 
finished. When there are no successors specified manually, the result consumers 
are scheduled as before. Mixing pipelined and blocking results leads to 
unspecified behaviour currently (aka it's not a good idea to do this at the 
moment).

When you have something like this:
```
O sink
|
. - denotes a pipelined result
O union
  +´|`+
  | | |
  ■ ■ ■ --- denotes a blocking result
  O O O
 src0  src1  src2
```
You can first first schedule `src0`, `src1`, `src2`, and then continue with 
the `union-sink` pipeline.

```java
src[0].setAsBatchSource(); // src0 is the first to go...

src[0].addBatchSuccessors(src[1]); // src0 = src1

src[1].addBatchSuccessors(src[2]); // src1 = src2

src[2].addBatchSuccessors(union); // src2 = [union = sink]
```

@StephanEwen or @tillrohrmann will work on the Optimizer/JobGraph 
counterpart of this and will build the `JobGraph` for programs in batch mode 
using the methods introduced in this PR. Do you guys think that this minimal 
support is sufficient for the first version?

(Going over the result partition notification code, I really think it's 
pressing to refactor it. It is very very hard to understand. The corresponding 
issue [FLINK-1833](https://issues.apache.org/jira/browse/FLINK-1833) has been 
created a while back. I want to do this after the release.)

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/uce/incubator-flink legs-2119

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/754.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #754


commit 4ac15982700257d3deb2d55a389afd0531f7f8be
Author: Ufuk Celebi u...@apache.org
Date:   2015-06-01T21:12:47Z

[FLINK-2119] Add ExecutionGraph support for batch scheduling




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Closed] (FLINK-1747) Remove deadlock detection and pipeline breaker placement in optimizer

2015-06-01 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen closed FLINK-1747.
---

 Remove deadlock detection and pipeline breaker placement in optimizer
 -

 Key: FLINK-1747
 URL: https://issues.apache.org/jira/browse/FLINK-1747
 Project: Flink
  Issue Type: Improvement
  Components: Optimizer
Affects Versions: master
Reporter: Ufuk Celebi
Priority: Minor

 The deadlock detection in the optimizer, which places pipeline breaking 
 caches has become redundant with recently added changes. We now use blocking 
 data exchanges for branching programs, which are merged again at a later 
 point.
 Therefore, we can start removing the respective code in the optimizer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-1747) Remove deadlock detection and pipeline breaker placement in optimizer

2015-06-01 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-1747.
-
Resolution: Duplicate

Duplicate of FLINK-2041

 Remove deadlock detection and pipeline breaker placement in optimizer
 -

 Key: FLINK-1747
 URL: https://issues.apache.org/jira/browse/FLINK-1747
 Project: Flink
  Issue Type: Improvement
  Components: Optimizer
Affects Versions: master
Reporter: Ufuk Celebi
Priority: Minor

 The deadlock detection in the optimizer, which places pipeline breaking 
 caches has become redundant with recently added changes. We now use blocking 
 data exchanges for branching programs, which are merged again at a later 
 point.
 Therefore, we can start removing the respective code in the optimizer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2098] Ensure checkpoints and element em...

2015-06-01 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/742#issuecomment-107751723
  
Looks good all in all. I am preparing a followup pull request that cleans 
up a few things, adds comments, and addresses Marton's comments.

One thing I noticed is that all the non-checkpointed sources have a 
checkpoint lock in the signature as well.

Should we offer two source interfaces: `SourceFunction` and 
`CheckpointedSourceFunction` ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2098) Checkpoint barrier initiation at source is not aligned with snapshotting

2015-06-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14568279#comment-14568279
 ] 

ASF GitHub Bot commented on FLINK-2098:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/742#issuecomment-107751723
  
Looks good all in all. I am preparing a followup pull request that cleans 
up a few things, adds comments, and addresses Marton's comments.

One thing I noticed is that all the non-checkpointed sources have a 
checkpoint lock in the signature as well.

Should we offer two source interfaces: `SourceFunction` and 
`CheckpointedSourceFunction` ?


 Checkpoint barrier initiation at source is not aligned with snapshotting
 

 Key: FLINK-2098
 URL: https://issues.apache.org/jira/browse/FLINK-2098
 Project: Flink
  Issue Type: Bug
  Components: Streaming
Affects Versions: 0.9
Reporter: Stephan Ewen
Assignee: Aljoscha Krettek
Priority: Blocker
 Fix For: 0.9


 The stream source does not properly align the emission of checkpoint barriers 
 with the drawing of snapshots.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2098] Improvements on checkpoint-aligne...

2015-06-01 Thread StephanEwen
GitHub user StephanEwen opened a pull request:

https://github.com/apache/flink/pull/755

[FLINK-2098] Improvements on checkpoint-aligned sources

Based on #742 , include multiple cleanups and fixes on top.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/StephanEwen/incubator-flink stream_sources

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/755.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #755


commit 152f4af6966e8a2e770f20a1f516d1de29c5b13e
Author: twalthr twal...@apache.org
Date:   2015-05-27T13:32:11Z

[hotfix] Remove execute() after print() in Table API examples

This closes #735

commit d82942a396777ea6debbf55a91916d4d5c3ecdaa
Author: Aljoscha Krettek aljoscha.kret...@gmail.com
Date:   2015-05-28T08:24:37Z

[FLINK-2098] Ensure checkpoints and element emission are in order

Before, it could happen that a streaming source would keep emitting
elements while a checkpoint is being performed. This can lead to
inconsistencies in the checkpoints with other operators.

This also adds a test that checks whether only one checkpoint is
executed at a time and the serial behaviour of checkpointing and
emission.

This changes the SourceFunction interface to have run()/cancel() methods
where the run() method takes a lock object on which it needs to
synchronize updates to state and emission of elements.

commit dad6a0092489fe7a9ef4508dc006d5a6cc42a2ba
Author: Stephan Ewen se...@apache.org
Date:   2015-06-02T01:31:43Z

[FLINK-2098] Improvements on checkpoint-aligned sources




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2098) Checkpoint barrier initiation at source is not aligned with snapshotting

2015-06-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14568362#comment-14568362
 ] 

ASF GitHub Bot commented on FLINK-2098:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/742#issuecomment-107768981
  
Some important comments:

  - Exceptions should always propagate, and exceptions during cancelling 
can be thrown. The `Task` class filters out exceptions that come after the 
transition to the canceling state. Don't try to be super smart there, 
propagate your exceptions, and the context will decide whether they should be 
logged.

  - The RabbitMQ source is not doing any proper failure handling 
(critical!). I opened a separate JIRA issue for that.

  - Just commenting out the twitter sources is bad style, in my opinion. 
There is actually no chance of getting this pull request in, and this one here 
is a release blocker, while the Twitter Source one is not a release blocker.

 - All functions set their `running` flag to true at the beginning of the 
`run()` method. In the case of races between invoking and canceling the source, 
it can mean that the flag is set to false in the cancel() method and then to 
true in the run() method, resulting in a non-canceled source. I fixed some 
cases in my follow up pull-request, but many are still in. Please make another 
careful pass with respect to that.

 - In general, the streaming sources are extremely badly commented. This 
needs big improvements!


 Checkpoint barrier initiation at source is not aligned with snapshotting
 

 Key: FLINK-2098
 URL: https://issues.apache.org/jira/browse/FLINK-2098
 Project: Flink
  Issue Type: Bug
  Components: Streaming
Affects Versions: 0.9
Reporter: Stephan Ewen
Assignee: Aljoscha Krettek
Priority: Blocker
 Fix For: 0.9


 The stream source does not properly align the emission of checkpoint barriers 
 with the drawing of snapshots.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2098] Ensure checkpoints and element em...

2015-06-01 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/742#issuecomment-107768981
  
Some important comments:

  - Exceptions should always propagate, and exceptions during cancelling 
can be thrown. The `Task` class filters out exceptions that come after the 
transition to the canceling state. Don't try to be super smart there, 
propagate your exceptions, and the context will decide whether they should be 
logged.

  - The RabbitMQ source is not doing any proper failure handling 
(critical!). I opened a separate JIRA issue for that.

  - Just commenting out the twitter sources is bad style, in my opinion. 
There is actually no chance of getting this pull request in, and this one here 
is a release blocker, while the Twitter Source one is not a release blocker.

 - All functions set their `running` flag to true at the beginning of the 
`run()` method. In the case of races between invoking and canceling the source, 
it can mean that the flag is set to false in the cancel() method and then to 
true in the run() method, resulting in a non-canceled source. I fixed some 
cases in my follow up pull-request, but many are still in. Please make another 
careful pass with respect to that.

 - In general, the streaming sources are extremely badly commented. This 
needs big improvements!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2098) Checkpoint barrier initiation at source is not aligned with snapshotting

2015-06-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14568363#comment-14568363
 ] 

ASF GitHub Bot commented on FLINK-2098:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/742#issuecomment-107769045
  
Here is my updated code: 
https://github.com/StephanEwen/incubator-flink/tree/stream_sources


 Checkpoint barrier initiation at source is not aligned with snapshotting
 

 Key: FLINK-2098
 URL: https://issues.apache.org/jira/browse/FLINK-2098
 Project: Flink
  Issue Type: Bug
  Components: Streaming
Affects Versions: 0.9
Reporter: Stephan Ewen
Assignee: Aljoscha Krettek
Priority: Blocker
 Fix For: 0.9


 The stream source does not properly align the emission of checkpoint barriers 
 with the drawing of snapshots.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1731) Add kMeans clustering algorithm to machine learning library

2015-06-01 Thread Sachin Goel (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14568392#comment-14568392
 ] 

Sachin Goel commented on FLINK-1731:


I'm creating a separate issue for Initialization schemes. This would address 
the Random, kmeans++ and kmeans|| initialization methods. Since any 
initialization itself is a solution to the kmeans problem, they would all be 
instances of Predictor also. User can access the centroids learned via 
instance.centroids and pass them to the KMeans algorithm which has been 
implemented. 
These is another way possible which takes the burden off the user to figure out 
how to pass the initial centroids to KMeans. We can have a parameter which 
signifies which initialization scheme to use. The KMeans algorithm would then 
need to call the appropriate initialization scheme in its fit function and work 
with the centroids found by the initialization scheme as its initial centroids.

 Add kMeans clustering algorithm to machine learning library
 ---

 Key: FLINK-1731
 URL: https://issues.apache.org/jira/browse/FLINK-1731
 Project: Flink
  Issue Type: New Feature
  Components: Machine Learning Library
Reporter: Till Rohrmann
Assignee: Peter Schrott
  Labels: ML

 The Flink repository already contains a kMeans implementation but it is not 
 yet ported to the machine learning library. I assume that only the used data 
 types have to be adapted and then it can be more or less directly moved to 
 flink-ml.
 The kMeans++ [1] and the kMeans|| [2] algorithm constitute a better 
 implementation because the improve the initial seeding phase to achieve near 
 optimal clustering. It might be worthwhile to implement kMeans||.
 Resources:
 [1] http://ilpubs.stanford.edu:8090/778/1/2006-13.pdf
 [2] http://theory.stanford.edu/~sergei/papers/vldb12-kmpar.pdf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-2130) RabbitMQ source does not fail when failing to retrieve elements

2015-06-01 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-2130:
---

 Summary: RabbitMQ source does not fail when failing to retrieve 
elements
 Key: FLINK-2130
 URL: https://issues.apache.org/jira/browse/FLINK-2130
 Project: Flink
  Issue Type: Bug
Reporter: Stephan Ewen


The RMQ source only logs when elements cannot be retrieved. Failures are not 
propagated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-2131) Add Initialization schemes for K-means clustering

2015-06-01 Thread Sachin Goel (JIRA)
Sachin Goel created FLINK-2131:
--

 Summary: Add Initialization schemes for K-means clustering
 Key: FLINK-2131
 URL: https://issues.apache.org/jira/browse/FLINK-2131
 Project: Flink
  Issue Type: Task
  Components: Machine Learning Library
Reporter: Sachin Goel
Assignee: Sachin Goel


The Lloyd's [KMeans] algorithm takes initial centroids as its input. However, 
in case the user doesn't provide the initial centers, they may ask for a 
particular initialization scheme to be followed. The most commonly used are 
these:
1. Random initialization: Self-explanatory
2. kmeans++ initialization: http://ilpubs.stanford.edu:8090/778/1/2006-13.pdf
3. kmeans|| : http://theory.stanford.edu/~sergei/papers/vldb12-kmpar.pdf

For very large data sets, or for large values of k, the kmeans|| method is 
preferred as it provides the same approximation guarantees as kmeans++ and 
requires lesser number of passes over the input data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2098] Ensure checkpoints and element em...

2015-06-01 Thread aljoscha
Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/742#issuecomment-107811407
  
I will fix the remaining cancel() races. The twitter stuff I just commented 
out because I assumed that the new TwitterSource could be merged right away and 
I was waiting for that.

As for two Source interfaces. We can certainly do that. The reason I didn't 
do it is because there would be a lot of duplication because we have 
SourceFunction, ParallelSourceFunction, RichSourceFunction and 
ParallelRichSourceFunction. With the new Source interface this would go up to 8 
interfaces for the sources. (That's also the reason why I didn't have Kafka 
derived from the RichParallelSourceFunction, I thought that maybe we could get 
rid of the special interfaces for parallel sources.)

Also, I realize there are many more problems. I just can't address them all 
in a single PR. :sweat_smile: 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---