[GitHub] flink pull request: [FLINK-2387] add streaming test case for live ...
GitHub user mxm opened a pull request: https://github.com/apache/flink/pull/926 [FLINK-2387] add streaming test case for live accumulators You can merge this pull request into a Git repository by running: $ git pull https://github.com/mxm/flink live-accumulators-streaming Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/926.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #926 commit e0668dc4a5e1bf23085c40d7abae4c8414c29707 Author: Maximilian Michels m...@apache.org Date: 2015-07-21T14:54:26Z [FLINK-2387] add streaming test case for live accumulators --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2387) Add test for live accumulators in Streaming
[ https://issues.apache.org/jira/browse/FLINK-2387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635295#comment-14635295 ] ASF GitHub Bot commented on FLINK-2387: --- GitHub user mxm opened a pull request: https://github.com/apache/flink/pull/926 [FLINK-2387] add streaming test case for live accumulators You can merge this pull request into a Git repository by running: $ git pull https://github.com/mxm/flink live-accumulators-streaming Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/926.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #926 commit e0668dc4a5e1bf23085c40d7abae4c8414c29707 Author: Maximilian Michels m...@apache.org Date: 2015-07-21T14:54:26Z [FLINK-2387] add streaming test case for live accumulators Add test for live accumulators in Streaming --- Key: FLINK-2387 URL: https://issues.apache.org/jira/browse/FLINK-2387 Project: Flink Issue Type: Test Components: Streaming Reporter: Maximilian Michels -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2375] Add Approximate Adamic Adar Simil...
Github user vasia commented on the pull request: https://github.com/apache/flink/pull/923#issuecomment-123394302 Hi @shghatge! The comments I left on #892 apply here as well. The difference would be that the neighborhoods of each of the neighbors will be represented as a bloom filter. It would also be nice to make the bloom filter parameters (size, number of hash functions, hash function) configurable, so that the use can adjust the false positives and size based on their use-case. What do you think? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Created] (FLINK-2385) Scala DataSet.distinct should have parenthesis
Stephan Ewen created FLINK-2385: --- Summary: Scala DataSet.distinct should have parenthesis Key: FLINK-2385 URL: https://issues.apache.org/jira/browse/FLINK-2385 Project: Flink Issue Type: Bug Components: Scala API Affects Versions: 0.9 Reporter: Stephan Ewen Assignee: Stephan Ewen Fix For: 0.10 The method is not a side-effect free accessor, but defines heavy computation, even if it does not mutate the original data set. This is a somewhat API breaking change. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-2386) Implement Kafka connector using the new Kafka Consumer API
Robert Metzger created FLINK-2386: - Summary: Implement Kafka connector using the new Kafka Consumer API Key: FLINK-2386 URL: https://issues.apache.org/jira/browse/FLINK-2386 Project: Flink Issue Type: Improvement Components: Kafka Connector Reporter: Robert Metzger Once Kafka has released its new consumer API, we should provide a connector for that version. The release will probably be called 0.9 or 0.8.3. The connector will be mostly compatible with Kafka 0.8.2.x, except for committing offsets to the broker (the new connector expects a coordinator to be available on Kafka). To work around that, we can provide a configuration option to commit offsets to zookeeper (managed by flink code). For 0.9/0.8.3 it will be fully compatible. It will not be compatible with 0.8.1 because of mismatching Kafka messages. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (FLINK-2382) Live Metric Reporting Does Not Work for Two-Input StreamTasks
[ https://issues.apache.org/jira/browse/FLINK-2382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maximilian Michels resolved FLINK-2382. --- Resolution: Fixed Fixed in 1d373a7. Live Metric Reporting Does Not Work for Two-Input StreamTasks - Key: FLINK-2382 URL: https://issues.apache.org/jira/browse/FLINK-2382 Project: Flink Issue Type: Bug Components: Streaming Affects Versions: 0.10 Reporter: Aljoscha Krettek Assignee: Maximilian Michels Also, there are no tests for the live metrics in streaming. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (FLINK-2371) AccumulatorLiveITCase fails
[ https://issues.apache.org/jira/browse/FLINK-2371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maximilian Michels resolved FLINK-2371. --- Resolution: Fixed AccumulatorLiveITCase fails --- Key: FLINK-2371 URL: https://issues.apache.org/jira/browse/FLINK-2371 Project: Flink Issue Type: Bug Components: Tests Reporter: Matthias J. Sax Assignee: Maximilian Michels AccumulatorLiveITCase fails regularly (however, not in each run). The tests relies on timing (via sleep) which does not work well on Travis. See dev-list for more details: https://mail-archives.apache.org/mod_mbox/flink-dev/201507.mbox/browser AccumulatorLiveITCase.testProgram:106-access$1100:68-checkFlinkAccumulators:189 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-2387) Add test for live accumulators in Streaming
Maximilian Michels created FLINK-2387: - Summary: Add test for live accumulators in Streaming Key: FLINK-2387 URL: https://issues.apache.org/jira/browse/FLINK-2387 Project: Flink Issue Type: Test Components: Streaming Reporter: Maximilian Michels -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2376] Raise the time limit in testFindC...
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/920#issuecomment-123359910 Looks good, +1 to merge. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: Collect(): Fixing the akka.framesize size limi...
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/887#issuecomment-123361546 Now that #896 is in, this becomes mergeable... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: FLINK-2018 Add ParameterUtil.fromGenericOption...
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/720#issuecomment-123370928 Looks good. I will merge this... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2375) Add Approximate Adamic Adar Similarity method using BloomFilters
[ https://issues.apache.org/jira/browse/FLINK-2375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635360#comment-14635360 ] ASF GitHub Bot commented on FLINK-2375: --- Github user vasia commented on the pull request: https://github.com/apache/flink/pull/923#issuecomment-123394302 Hi @shghatge! The comments I left on #892 apply here as well. The difference would be that the neighborhoods of each of the neighbors will be represented as a bloom filter. It would also be nice to make the bloom filter parameters (size, number of hash functions, hash function) configurable, so that the use can adjust the false positives and size based on their use-case. What do you think? Add Approximate Adamic Adar Similarity method using BloomFilters Key: FLINK-2375 URL: https://issues.apache.org/jira/browse/FLINK-2375 Project: Flink Issue Type: Task Components: Gelly Reporter: Shivani Ghatge Assignee: Shivani Ghatge Priority: Minor Just as Jaccard, the Adamic-Adar algorithm measures the similarity between a set of nodes. However, instead of counting the common neighbors and dividing them by the total number of neighbors, the similarity is weighted according to the vertex degrees. In particular, it's equal to log(1/numberOfEdges). The Adamic-Adar algorithm can be broken into three steps: 1). For each vertex, compute the log of its inverse degrees (with the formula above) and set it as the vertex value. 2). Each vertex will then send this new computed value along with a list of neighbors to the targets of its out-edges 3). Weigh the edges with the Adamic-Adar index: Sum over n from CN of log(1/k_n)(CN is the set of all common neighbors of two vertices x, y. k_n is the degree of node n). See [2] Using BloomFilters we increase the scalability of the algorithm. The values calculated for the edges will be approximate. Prerequisites: Full understanding of the Jaccard Similarity Measure algorithm Reading the associated literature: [1] http://social.cs.uiuc.edu/class/cs591kgk/friendsadamic.pdf [2] http://stackoverflow.com/questions/22565620/fast-algorithm-to-compute-adamic-adar -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1943] [gelly] Added GSA compiler and tr...
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/916#issuecomment-123362529 Super, +1 The test failures are unrelated. (One of the unstable tests is fixed already). Will merge this... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1943) Add Gelly-GSA compiler and translation tests
[ https://issues.apache.org/jira/browse/FLINK-1943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635226#comment-14635226 ] ASF GitHub Bot commented on FLINK-1943: --- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/916#issuecomment-123362529 Super, +1 The test failures are unrelated. (One of the unstable tests is fixed already). Will merge this... Add Gelly-GSA compiler and translation tests Key: FLINK-1943 URL: https://issues.apache.org/jira/browse/FLINK-1943 Project: Flink Issue Type: Test Components: Gelly Affects Versions: 0.9 Reporter: Vasia Kalavri Assignee: Vasia Kalavri These should be similar to the corresponding Spargel tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2312) Random Splits
[ https://issues.apache.org/jira/browse/FLINK-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635328#comment-14635328 ] ASF GitHub Bot commented on FLINK-2312: --- Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/921#issuecomment-123390786 This leads to non-mutually exclusive splits. I tracked down the reason for this: The input data is parallelized differently while performing the splits for every fraction. This leads to an altogether different sequence of random numbers, hence the problem. @tillrohrmann, I use a seed value to initialize, as in the cross-validation PR. Is there any way I can fix the parallelization of data, so running the split for every fraction leads to exactly same sequence of numbers. Persist would write the data to disk, which is something of an overhead isn't it? Random Splits - Key: FLINK-2312 URL: https://issues.apache.org/jira/browse/FLINK-2312 Project: Flink Issue Type: Wish Components: Machine Learning Library Reporter: Maximilian Alber Assignee: pietro pinoli Priority: Minor In machine learning applications it is common to split data sets into f.e. training and testing set. To the best of my knowledge there is at the moment no nice way in Flink to split a data set randomly into several partitions according to some ratio. The wished semantic would be the same as of Sparks RDD randomSplit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2312][utils] Randomly Splitting a Data ...
Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/921#issuecomment-123390786 This leads to non-mutually exclusive splits. I tracked down the reason for this: The input data is parallelized differently while performing the splits for every fraction. This leads to an altogether different sequence of random numbers, hence the problem. @tillrohrmann, I use a seed value to initialize, as in the cross-validation PR. Is there any way I can fix the parallelization of data, so running the split for every fraction leads to exactly same sequence of numbers. Persist would write the data to disk, which is something of an overhead isn't it? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1901) Create sample operator for Dataset
[ https://issues.apache.org/jira/browse/FLINK-1901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635206#comment-14635206 ] Till Rohrmann commented on FLINK-1901: -- I think this solution is indeed a little bit too hacky. It would be very unintuitive for the user having to broadcast the iteration {{DataSet}} to the sampling operator. Furthermore, this will inflict unnecessary network I/O. I think we should try to solve this problem properly. This means that we have a single sampling operator which works inside and outside of iterations. This will also avoid code duplication. Create sample operator for Dataset -- Key: FLINK-1901 URL: https://issues.apache.org/jira/browse/FLINK-1901 Project: Flink Issue Type: Improvement Components: Core Reporter: Theodore Vasiloudis Assignee: Chengxiang Li In order to be able to implement Stochastic Gradient Descent and a number of other machine learning algorithms we need to have a way to take a random sample from a Dataset. We need to be able to sample with or without replacement from the Dataset, choose the relative size of the sample, and set a seed for reproducibility. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2386) Implement Kafka connector using the new Kafka Consumer API
[ https://issues.apache.org/jira/browse/FLINK-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635243#comment-14635243 ] Robert Metzger commented on FLINK-2386: --- My current work in progress code is located here: https://github.com/rmetzger/flink/tree/flink2386 It contains a full copy of the new kafka consumer code (because it is not yet released). I'll probably finish the changes once Kafka has released the new consumer. Implement Kafka connector using the new Kafka Consumer API -- Key: FLINK-2386 URL: https://issues.apache.org/jira/browse/FLINK-2386 Project: Flink Issue Type: Improvement Components: Kafka Connector Reporter: Robert Metzger Once Kafka has released its new consumer API, we should provide a connector for that version. The release will probably be called 0.9 or 0.8.3. The connector will be mostly compatible with Kafka 0.8.2.x, except for committing offsets to the broker (the new connector expects a coordinator to be available on Kafka). To work around that, we can provide a configuration option to commit offsets to zookeeper (managed by flink code). For 0.9/0.8.3 it will be fully compatible. It will not be compatible with 0.8.1 because of mismatching Kafka messages. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2018) Add ParameterUtil.fromGenericOptionsParser() for compatibility to Hadoop's argument parser
[ https://issues.apache.org/jira/browse/FLINK-2018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635276#comment-14635276 ] ASF GitHub Bot commented on FLINK-2018: --- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/720#issuecomment-123370928 Looks good. I will merge this... Add ParameterUtil.fromGenericOptionsParser() for compatibility to Hadoop's argument parser -- Key: FLINK-2018 URL: https://issues.apache.org/jira/browse/FLINK-2018 Project: Flink Issue Type: Improvement Reporter: Robert Metzger Priority: Minor Labels: starter In FLINK-1525 we've added the {{ParameterTool}}. For users used to Hadoop's {{GenericOptionsParser}} it would be great to provide a compatible parser. See: https://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/util/GenericOptionsParser.html {code} @Test public void testFromGenericOptionsParser() { ParameterUtil parameter = ParameterUtil.fromGenericOptionsParser(new String[]{-D, input=myinput, -DexpectedCount=15}); validate(parameter); } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2322) Unclosed stream may leak resource
[ https://issues.apache.org/jira/browse/FLINK-2322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635537#comment-14635537 ] ASF GitHub Bot commented on FLINK-2322: --- GitHub user tedyu opened a pull request: https://github.com/apache/flink/pull/928 FLINK-2322 Unclosed stream may leak resource What size should I use for tab ? You can merge this pull request into a Git repository by running: $ git pull https://github.com/tedyu/flink master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/928.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #928 commit c4a16ce1910b7ac53ee63a4ac9c1d85299e16204 Author: tedyu yuzhih...@gmail.com Date: 2015-07-21T18:13:17Z FLINK-2322 Unclosed stream may leak resource Unclosed stream may leak resource - Key: FLINK-2322 URL: https://issues.apache.org/jira/browse/FLINK-2322 Project: Flink Issue Type: Bug Reporter: Ted Yu Labels: starter In UdfAnalyzerUtils.java : {code} ClassReader cr = new ClassReader(Thread.currentThread().getContextClassLoader() .getResourceAsStream(internalClassName.replace('.', '/') + .class)); {code} The stream returned by getResourceAsStream() should be closed upon exit of findMethodNode() In ParameterTool#fromPropertiesFile(): {code} props.load(new FileInputStream(propertiesFile)); {code} The FileInputStream should be closed before returning. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2205) Confusing entries in JM Webfrontend Job Configuration section
[ https://issues.apache.org/jira/browse/FLINK-2205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635462#comment-14635462 ] ASF GitHub Bot commented on FLINK-2205: --- GitHub user ebautistabar opened a pull request: https://github.com/apache/flink/pull/927 [FLINK-2205] Fix confusing entries in JM UI Job Config. section Default display for 'Number of execution retries' is now 'deactivated' and for 'Job parallelism' is 'auto', as suggested in JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ebautistabar/flink change-job-config-display Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/927.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #927 commit ba7534ce56d5329a25d484d0e06aa36363781a65 Author: Enrique Bautista ebautista...@gmail.com Date: 2015-07-21T16:52:05Z [FLINK-2205] Fix confusing entries in JM UI Job Config. section Default display for 'Number of execution retries' is now 'deactivated' and for 'Job parallelism' is 'auto', as suggested in JIRA. Confusing entries in JM Webfrontend Job Configuration section - Key: FLINK-2205 URL: https://issues.apache.org/jira/browse/FLINK-2205 Project: Flink Issue Type: Bug Components: Webfrontend Affects Versions: 0.9 Reporter: Fabian Hueske Priority: Minor Labels: starter The Job Configuration section of the job history / analyze page of the JobManager webinterface contains two confusing entries: - {{Number of execution retries}} is actually the maximum number of retries and should be renamed accordingly. The default value is -1 and should be changed to deactivated (or 0). - {{Job parallelism}} which is -1 by default. A parallelism of -1 is not very meaningful. It would be better to show something like auto -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-2388) JobManager should try retrieving jobs from archive
Enrique Bautista Barahona created FLINK-2388: Summary: JobManager should try retrieving jobs from archive Key: FLINK-2388 URL: https://issues.apache.org/jira/browse/FLINK-2388 Project: Flink Issue Type: Task Components: JobManager Reporter: Enrique Bautista Barahona I was following the quickstart guide with the WordCount example and, when I entered the analyze page for the job, nothing came up. Apparently the JobManagerInfoServlet fails with a NullPointerException. I've been reading the code and I've seen the problem is in the processing of the RequestAccumulatorResultsStringified message in JobManager. There's a TODO where the accumulators should be retrieved from the archive. As I wanted to know more about Flink internals, I decided to try and fix it. I've later seen that there's currently ongoing work in that part of the code, so I guess maybe it's not needed, but if you want I could submit a PR. If you have already taken it into account and will solve ti shortly, please feel free to close the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1520) Read edges and vertices from CSV files
[ https://issues.apache.org/jira/browse/FLINK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635396#comment-14635396 ] ASF GitHub Bot commented on FLINK-1520: --- Github user vasia commented on the pull request: https://github.com/apache/flink/pull/847#issuecomment-123402206 I see your point @shghatge. However, I think naming just one method differently will be confusing.. If we're going to have custom method names, let's go with @andralungu's suggestion above and make sure we document these properly. I would prefer a bit shorter method names though. How about: 1). `keyType(K)` 2). `vertexTypes(K, VV)` 3). `edgeTypes(K, EV)` 4). `types(K, VV, EV)` ? Read edges and vertices from CSV files -- Key: FLINK-1520 URL: https://issues.apache.org/jira/browse/FLINK-1520 Project: Flink Issue Type: New Feature Components: Gelly Reporter: Vasia Kalavri Assignee: Shivani Ghatge Priority: Minor Labels: easyfix, newbie Add methods to create Vertex and Edge Datasets directly from CSV file inputs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2205] Fix confusing entries in JM UI Jo...
GitHub user ebautistabar opened a pull request: https://github.com/apache/flink/pull/927 [FLINK-2205] Fix confusing entries in JM UI Job Config. section Default display for 'Number of execution retries' is now 'deactivated' and for 'Job parallelism' is 'auto', as suggested in JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ebautistabar/flink change-job-config-display Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/927.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #927 commit ba7534ce56d5329a25d484d0e06aa36363781a65 Author: Enrique Bautista ebautista...@gmail.com Date: 2015-07-21T16:52:05Z [FLINK-2205] Fix confusing entries in JM UI Job Config. section Default display for 'Number of execution retries' is now 'deactivated' and for 'Job parallelism' is 'auto', as suggested in JIRA. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Updated] (FLINK-2388) JobManager should try retrieving jobs from archive
[ https://issues.apache.org/jira/browse/FLINK-2388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enrique Bautista Barahona updated FLINK-2388: - Description: I was following the quickstart guide with the WordCount example and, when I entered the analyze page for the job, nothing came up. Apparently the JobManagerInfoServlet fails with a NullPointerException. I've been reading the code and I've seen the problem is in the processing of the RequestAccumulatorResultsStringified message in JobManager. There's a TODO where the accumulators should be retrieved from the archive. As I wanted to know more about Flink internals, I decided to try and fix it. I've later seen that there's currently ongoing work in that part of the code, so I guess maybe it's not needed, but if you want I could submit a PR. If you have already taken it into account and will solve it shortly, please feel free to close the issue. was: I was following the quickstart guide with the WordCount example and, when I entered the analyze page for the job, nothing came up. Apparently the JobManagerInfoServlet fails with a NullPointerException. I've been reading the code and I've seen the problem is in the processing of the RequestAccumulatorResultsStringified message in JobManager. There's a TODO where the accumulators should be retrieved from the archive. As I wanted to know more about Flink internals, I decided to try and fix it. I've later seen that there's currently ongoing work in that part of the code, so I guess maybe it's not needed, but if you want I could submit a PR. If you have already taken it into account and will solve ti shortly, please feel free to close the issue. JobManager should try retrieving jobs from archive -- Key: FLINK-2388 URL: https://issues.apache.org/jira/browse/FLINK-2388 Project: Flink Issue Type: Task Components: JobManager Reporter: Enrique Bautista Barahona I was following the quickstart guide with the WordCount example and, when I entered the analyze page for the job, nothing came up. Apparently the JobManagerInfoServlet fails with a NullPointerException. I've been reading the code and I've seen the problem is in the processing of the RequestAccumulatorResultsStringified message in JobManager. There's a TODO where the accumulators should be retrieved from the archive. As I wanted to know more about Flink internals, I decided to try and fix it. I've later seen that there's currently ongoing work in that part of the code, so I guess maybe it's not needed, but if you want I could submit a PR. If you have already taken it into account and will solve it shortly, please feel free to close the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1901) Create sample operator for Dataset
[ https://issues.apache.org/jira/browse/FLINK-1901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14636124#comment-14636124 ] Chengxiang Li commented on FLINK-1901: -- every point is sampled with probability 1/N is one of the sampling case(sampling with fraction, without replacement), there are 3 others kind of sampling case which is normally used as well, like sampling with fraction, with replacement, sampling with fixed size, without replacement and sampling with fixed size, with replacement. We should support all of them while expose a sampling operator to user. Create sample operator for Dataset -- Key: FLINK-1901 URL: https://issues.apache.org/jira/browse/FLINK-1901 Project: Flink Issue Type: Improvement Components: Core Reporter: Theodore Vasiloudis Assignee: Chengxiang Li In order to be able to implement Stochastic Gradient Descent and a number of other machine learning algorithms we need to have a way to take a random sample from a Dataset. We need to be able to sample with or without replacement from the Dataset, choose the relative size of the sample, and set a seed for reproducibility. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (FLINK-1901) Create sample operator for Dataset
[ https://issues.apache.org/jira/browse/FLINK-1901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634622#comment-14634622 ] Chengxiang Li edited comment on FLINK-1901 at 7/22/15 2:05 AM: --- To randomly choose a sample from a DataSet S, basically, there exists two kinds of sample requirement: sampling with fraction(such as randomly choose 5% percent items in S) and sampling with fixed size(such as randomly choose 100 items from S). Besides, we do not know the size of S, unless we take extra cost to computer it through DataSet::count(). # Sampling with fraction #* With replacement: the expected sample size follow [Poisson Distribution|https://en.wikipedia.org/wiki/Poisson_distribution] in this case, so Poisson Sampling can be used to choose the sample items. #* Without replacement: during sampling, we can take the sample of each item in iterator as a [Bernoulli Trial|https://en.wikipedia.org/wiki/Bernoulli_trial]. # Sampling with fixed size #* Use DataSet::count() to get the dataset size, with the fixed size, we can turn this into sampling with factor. #* [Reservoir Sampling|https://en.wikipedia.org/wiki/Reservoir_sampling] is another commonly used algorithm to randomly choose a sample of k items from a list S containing n items, where n is either a very large or unknown number, and there are different reservoir sampling algorithms that support reservoir support both sampling with replacement and sampling without replacement. was (Author: chengxiang li): To randomly choose a sample from a DataSet S, basically, there exists two kinds of sample requirement: sampling with factor(such as randomly choose 5% percent items in S) and sampling with fixed size(such as randomly choose 100 items from S). Besides, we do not know the size of S, unless we take extra cost to computer it through DataSet::count(). # Sampling with factor #* With replacement: the expected sample size follow [Poisson Distribution|https://en.wikipedia.org/wiki/Poisson_distribution] in this case, so Poisson Sampling can be used to choose the sample items. #* Without replacement: during sampling, we can take the sample of each item in iterator as a [Bernoulli Trial|https://en.wikipedia.org/wiki/Bernoulli_trial]. # Sampling with fixed size #* Use DataSet::count() to get the dataset size, with the fixed size, we can turn this into sampling with factor. #* [Reservoir Sampling|https://en.wikipedia.org/wiki/Reservoir_sampling] is another commonly used algorithm to randomly choose a sample of k items from a list S containing n items, where n is either a very large or unknown number, and there are different reservoir sampling algorithms that support reservoir support both sampling with replacement and sampling without replacement. Create sample operator for Dataset -- Key: FLINK-1901 URL: https://issues.apache.org/jira/browse/FLINK-1901 Project: Flink Issue Type: Improvement Components: Core Reporter: Theodore Vasiloudis Assignee: Chengxiang Li In order to be able to implement Stochastic Gradient Descent and a number of other machine learning algorithms we need to have a way to take a random sample from a Dataset. We need to be able to sample with or without replacement from the Dataset, choose the relative size of the sample, and set a seed for reproducibility. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1901) Create sample operator for Dataset
[ https://issues.apache.org/jira/browse/FLINK-1901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14636332#comment-14636332 ] Chengxiang Li commented on FLINK-1901: -- I write a simple example of sampling operator at [here|https://github.com/ChengXiangLi/flink/blob/FLINK-1901/flink-examples/flink-java-examples/src/main/java/org/apache/flink/examples/java/wordcount/TestSample.java], it works just as expected inside or outside of iteration(for example, 1000 items, sample fraction 0.5, after 3 iterations, output contains around 125 items), [~trohrm...@apache.org], i'm not sure whether i understand you correctly about sampling inside iteration, it looks same to me if it's the case in the example. Create sample operator for Dataset -- Key: FLINK-1901 URL: https://issues.apache.org/jira/browse/FLINK-1901 Project: Flink Issue Type: Improvement Components: Core Reporter: Theodore Vasiloudis Assignee: Chengxiang Li In order to be able to implement Stochastic Gradient Descent and a number of other machine learning algorithms we need to have a way to take a random sample from a Dataset. We need to be able to sample with or without replacement from the Dataset, choose the relative size of the sample, and set a seed for reproducibility. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-2391) Storm-compatibility:method FlinkTopologyBuilder.createTopology() throws java.lang.NullPointerException
Huang Wei created FLINK-2391: Summary: Storm-compatibility:method FlinkTopologyBuilder.createTopology() throws java.lang.NullPointerException Key: FLINK-2391 URL: https://issues.apache.org/jira/browse/FLINK-2391 Project: Flink Issue Type: Bug Components: flink-contrib Affects Versions: 0.10 Environment: win7 32bit;linux Reporter: Huang Wei Fix For: 0.10 core dumped at FlinkOutputFieldsDeclarer.java : 160(package FlinkOutputFieldsDeclarer). code : fieldIndexes[i] = this.outputSchema.fieldIndex(groupingFields.get(i)); in this line, the var this.outputSchema may be null. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-2362) distinct is missing in DataSet API documentation
[ https://issues.apache.org/jira/browse/FLINK-2362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Ewen closed FLINK-2362. --- distinct is missing in DataSet API documentation Key: FLINK-2362 URL: https://issues.apache.org/jira/browse/FLINK-2362 Project: Flink Issue Type: Bug Components: Documentation, Java API, Scala API Affects Versions: 0.9, 0.10 Reporter: Fabian Hueske Assignee: pietro pinoli Fix For: 0.10, 0.9.1 The DataSet transformation {{distinct}} is not described or listed in the documentation. It is not contained in the DataSet API programming guide (https://ci.apache.org/projects/flink/flink-docs-release-0.9/apis/programming_guide.html) and not in the DataSet Transformation (https://ci.apache.org/projects/flink/flink-docs-release-0.9/apis/dataset_transformations.html) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (FLINK-2376) testFindConnectableAddress sometimes fails on Windows because of the time limit
[ https://issues.apache.org/jira/browse/FLINK-2376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Ewen resolved FLINK-2376. - Resolution: Fixed Fix Version/s: 0.10 Fixed via bbbfd22077623633363d656d66a00028e7fea631 testFindConnectableAddress sometimes fails on Windows because of the time limit --- Key: FLINK-2376 URL: https://issues.apache.org/jira/browse/FLINK-2376 Project: Flink Issue Type: Bug Environment: Windows Reporter: Gabor Gevay Assignee: Gabor Gevay Priority: Minor Fix For: 0.10 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-2377) AbstractTestBase.deleteAllTempFiles sometimes fails on Windows
[ https://issues.apache.org/jira/browse/FLINK-2377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Ewen closed FLINK-2377. --- AbstractTestBase.deleteAllTempFiles sometimes fails on Windows -- Key: FLINK-2377 URL: https://issues.apache.org/jira/browse/FLINK-2377 Project: Flink Issue Type: Bug Components: Tests Environment: Windows Reporter: Gabor Gevay Assignee: Gabor Gevay Priority: Minor Fix For: 0.10 This is probably another file closing issue. (that is, Windows won't delete open files, as opposed to Linux) I have encountered two concrete tests so far where this actually appears: CsvOutputFormatITCase and CollectionSourceTest. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (FLINK-2198) BlobManager tests fail on Windows
[ https://issues.apache.org/jira/browse/FLINK-2198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Ewen resolved FLINK-2198. - Resolution: Fixed Assignee: Gabor Gevay Fix Version/s: 0.10 Fixed via 50344d70deb6348923f42a1ddaaf0e570708bc71 BlobManager tests fail on Windows - Key: FLINK-2198 URL: https://issues.apache.org/jira/browse/FLINK-2198 Project: Flink Issue Type: Bug Components: Build System, Tests Affects Versions: 0.9 Environment: Windows 8.1, Java 7, Maven 3.3.3 Reporter: Fabian Hueske Assignee: Gabor Gevay Fix For: 0.10 Building Flink on Windows using {{mvn clean install}} fails with the following error: {code} BlobUtilsTest.before:45 null BlobUtilsTest.before:45 null BlobServerDeleteTest.testDeleteFails:291 null BlobLibraryCacheManagerTest.testRegisterAndDownload:196 Could not remove write permissions from cache directory BlobServerPutTest.testPutBufferFails:224 null BlobServerPutTest.testPutNamedBufferFails:286 null JobManagerStartupTest.before:55 null JobManagerStartupTest.before:55 null DataSinkTaskTest.testFailingDataSinkTask:317 Temp output file has not been removed DataSinkTaskTest.testFailingSortingDataSinkTask:358 Temp output file has not been removed TaskManagerTest.testSubmitAndExecuteTask**:123 assertion failed: timeout (19998080696 nanoseconds) during expectMsgClass waiting for class org.apache.flink.runtime.messages.RegistrationMessages$RegisterTaskManager TaskManagerProcessReapingTest.testReapProcessOnFailure:133 TaskManager process did not launch the TaskManager properly. Failed to look up akka.tcp://flink@127.0.0.1:50673/user/taskmanager {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-2198) BlobManager tests fail on Windows
[ https://issues.apache.org/jira/browse/FLINK-2198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Ewen closed FLINK-2198. --- BlobManager tests fail on Windows - Key: FLINK-2198 URL: https://issues.apache.org/jira/browse/FLINK-2198 Project: Flink Issue Type: Bug Components: Build System, Tests Affects Versions: 0.9 Environment: Windows 8.1, Java 7, Maven 3.3.3 Reporter: Fabian Hueske Assignee: Gabor Gevay Fix For: 0.10 Building Flink on Windows using {{mvn clean install}} fails with the following error: {code} BlobUtilsTest.before:45 null BlobUtilsTest.before:45 null BlobServerDeleteTest.testDeleteFails:291 null BlobLibraryCacheManagerTest.testRegisterAndDownload:196 Could not remove write permissions from cache directory BlobServerPutTest.testPutBufferFails:224 null BlobServerPutTest.testPutNamedBufferFails:286 null JobManagerStartupTest.before:55 null JobManagerStartupTest.before:55 null DataSinkTaskTest.testFailingDataSinkTask:317 Temp output file has not been removed DataSinkTaskTest.testFailingSortingDataSinkTask:358 Temp output file has not been removed TaskManagerTest.testSubmitAndExecuteTask**:123 assertion failed: timeout (19998080696 nanoseconds) during expectMsgClass waiting for class org.apache.flink.runtime.messages.RegistrationMessages$RegisterTaskManager TaskManagerProcessReapingTest.testReapProcessOnFailure:133 TaskManager process did not launch the TaskManager properly. Failed to look up akka.tcp://flink@127.0.0.1:50673/user/taskmanager {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-2018) Add ParameterUtil.fromGenericOptionsParser() for compatibility to Hadoop's argument parser
[ https://issues.apache.org/jira/browse/FLINK-2018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Ewen closed FLINK-2018. --- Add ParameterUtil.fromGenericOptionsParser() for compatibility to Hadoop's argument parser -- Key: FLINK-2018 URL: https://issues.apache.org/jira/browse/FLINK-2018 Project: Flink Issue Type: Improvement Reporter: Robert Metzger Priority: Minor Labels: starter Fix For: 0.10 In FLINK-1525 we've added the {{ParameterTool}}. For users used to Hadoop's {{GenericOptionsParser}} it would be great to provide a compatible parser. See: https://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/util/GenericOptionsParser.html {code} @Test public void testFromGenericOptionsParser() { ParameterUtil parameter = ParameterUtil.fromGenericOptionsParser(new String[]{-D, input=myinput, -DexpectedCount=15}); validate(parameter); } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (FLINK-2018) Add ParameterUtil.fromGenericOptionsParser() for compatibility to Hadoop's argument parser
[ https://issues.apache.org/jira/browse/FLINK-2018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Ewen resolved FLINK-2018. - Resolution: Fixed Fix Version/s: 0.10 Fixed via 148395bcd81a93bcb1473e4e93f267edb3b71c7e Add ParameterUtil.fromGenericOptionsParser() for compatibility to Hadoop's argument parser -- Key: FLINK-2018 URL: https://issues.apache.org/jira/browse/FLINK-2018 Project: Flink Issue Type: Improvement Reporter: Robert Metzger Priority: Minor Labels: starter Fix For: 0.10 In FLINK-1525 we've added the {{ParameterTool}}. For users used to Hadoop's {{GenericOptionsParser}} it would be great to provide a compatible parser. See: https://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/util/GenericOptionsParser.html {code} @Test public void testFromGenericOptionsParser() { ParameterUtil parameter = ParameterUtil.fromGenericOptionsParser(new String[]{-D, input=myinput, -DexpectedCount=15}); validate(parameter); } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (FLINK-2389) Add dashboard frontend architecture amd build infrastructue
[ https://issues.apache.org/jira/browse/FLINK-2389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Ewen resolved FLINK-2389. - Resolution: Fixed Fixed via d59cebd8c3f643dc6da88924300e78f65f26c640 and ea2b1b139eb221b2b1a81ee776a04e711c3f8af4 (caveat: wrong issue number accidentally committed) Add dashboard frontend architecture amd build infrastructue --- Key: FLINK-2389 URL: https://issues.apache.org/jira/browse/FLINK-2389 Project: Flink Issue Type: Sub-task Components: Webfrontend Affects Versions: 0.10 Reporter: Stephan Ewen Assignee: Stephan Ewen Fix For: 0.10 Add the build infrastructure and basic libraries for a modern and modular web dashboard. The dashboard is going to be built using angular.js. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-2389) Add dashboard frontend architecture amd build infrastructue
Stephan Ewen created FLINK-2389: --- Summary: Add dashboard frontend architecture amd build infrastructue Key: FLINK-2389 URL: https://issues.apache.org/jira/browse/FLINK-2389 Project: Flink Issue Type: Sub-task Components: Webfrontend Affects Versions: 0.10 Reporter: Stephan Ewen Assignee: Stephan Ewen Fix For: 0.10 Add the build infrastructure and basic libraries for a modern and modular web dashboard. The dashboard is going to be built using angular.js. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2322) Unclosed stream may leak resource
[ https://issues.apache.org/jira/browse/FLINK-2322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635667#comment-14635667 ] ASF GitHub Bot commented on FLINK-2322: --- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/928#issuecomment-123453516 You should use tabs ;-) Otherwise this is a good fix. To optimize it: You need not close the stream in two places, the finally clause will do it anyways. Unclosed stream may leak resource - Key: FLINK-2322 URL: https://issues.apache.org/jira/browse/FLINK-2322 Project: Flink Issue Type: Bug Reporter: Ted Yu Labels: starter In UdfAnalyzerUtils.java : {code} ClassReader cr = new ClassReader(Thread.currentThread().getContextClassLoader() .getResourceAsStream(internalClassName.replace('.', '/') + .class)); {code} The stream returned by getResourceAsStream() should be closed upon exit of findMethodNode() In ParameterTool#fromPropertiesFile(): {code} props.load(new FileInputStream(propertiesFile)); {code} The FileInputStream should be closed before returning. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-2388) JobManager should try retrieving jobs from archive
[ https://issues.apache.org/jira/browse/FLINK-2388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enrique Bautista Barahona updated FLINK-2388: - Description: I was following the quickstart guide with the WordCount example and, when I entered the analyze page for the job, nothing came up. Apparently the JobManagerInfoServlet fails with a NullPointerException. I've been reading the code and I've seen the problem is in the processing of the RequestAccumulatorResultsStringified message in JobManager. There's a TODO where the accumulators should be retrieved from the archive. As I wanted to know more about Flink internals, I decided to try and fix it. I've later seen that there's currently ongoing work in that part of the code, so I guess maybe it's not needed, but if you want I could submit a PR. If you have already taken it into account and will solve it shortly, please feel free to close the issue. If you want to take a look, the commit is here: https://github.com/ebautistabar/flink/commit/8536352b21fb6c78ad8840b0397509df04358c6b was: I was following the quickstart guide with the WordCount example and, when I entered the analyze page for the job, nothing came up. Apparently the JobManagerInfoServlet fails with a NullPointerException. I've been reading the code and I've seen the problem is in the processing of the RequestAccumulatorResultsStringified message in JobManager. There's a TODO where the accumulators should be retrieved from the archive. As I wanted to know more about Flink internals, I decided to try and fix it. I've later seen that there's currently ongoing work in that part of the code, so I guess maybe it's not needed, but if you want I could submit a PR. If you have already taken it into account and will solve it shortly, please feel free to close the issue. JobManager should try retrieving jobs from archive -- Key: FLINK-2388 URL: https://issues.apache.org/jira/browse/FLINK-2388 Project: Flink Issue Type: Task Components: JobManager Reporter: Enrique Bautista Barahona I was following the quickstart guide with the WordCount example and, when I entered the analyze page for the job, nothing came up. Apparently the JobManagerInfoServlet fails with a NullPointerException. I've been reading the code and I've seen the problem is in the processing of the RequestAccumulatorResultsStringified message in JobManager. There's a TODO where the accumulators should be retrieved from the archive. As I wanted to know more about Flink internals, I decided to try and fix it. I've later seen that there's currently ongoing work in that part of the code, so I guess maybe it's not needed, but if you want I could submit a PR. If you have already taken it into account and will solve it shortly, please feel free to close the issue. If you want to take a look, the commit is here: https://github.com/ebautistabar/flink/commit/8536352b21fb6c78ad8840b0397509df04358c6b -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1228) Add REST Interface to JobManager
[ https://issues.apache.org/jira/browse/FLINK-1228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635638#comment-14635638 ] ASF GitHub Bot commented on FLINK-1228: --- Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/297 Add REST Interface to JobManager Key: FLINK-1228 URL: https://issues.apache.org/jira/browse/FLINK-1228 Project: Flink Issue Type: Improvement Reporter: Arvid Heise For rolling out jobs to an external cluster, we currently have 3 choices: a) Manual submission with Web Interface b) Automatic/Manual submission with CLClient c) Automatic submission with custom client I propose to add a way to submit jobs automatically through a HTTP Rest Interface. Among other benefits, this extension allows an automatic submission of jobs through a restrictive proxy. Rough idea: The web interface would offer a REST entry point for example /jobs. POSTing to this entry point allows the submission of a new job and returns the job URL. http://docs.aws.amazon.com/AmazonS3/latest/API/sigv4-post-example.html GETting the job URL returns a small status. DELETING the job URL aborts the job. GETting on the /jobs returns a list of active and scheduled jobs. Since Flink already has a Jetty web server and uses Json for other services, the basic extension should require low effort. It would help Flink to be used inside larger corporations and align the interfaces with the other state-of-the-art MapReduce systems (s3, HDFS, HBase all have HTTP interface). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2376) testFindConnectableAddress sometimes fails on Windows because of the time limit
[ https://issues.apache.org/jira/browse/FLINK-2376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635633#comment-14635633 ] ASF GitHub Bot commented on FLINK-2376: --- Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/920 testFindConnectableAddress sometimes fails on Windows because of the time limit --- Key: FLINK-2376 URL: https://issues.apache.org/jira/browse/FLINK-2376 Project: Flink Issue Type: Bug Environment: Windows Reporter: Gabor Gevay Assignee: Gabor Gevay Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2362) distinct is missing in DataSet API documentation
[ https://issues.apache.org/jira/browse/FLINK-2362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635630#comment-14635630 ] ASF GitHub Bot commented on FLINK-2362: --- Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/922 distinct is missing in DataSet API documentation Key: FLINK-2362 URL: https://issues.apache.org/jira/browse/FLINK-2362 Project: Flink Issue Type: Bug Components: Documentation, Java API, Scala API Affects Versions: 0.9, 0.10 Reporter: Fabian Hueske Assignee: pietro pinoli Fix For: 0.10, 0.9.1 The DataSet transformation {{distinct}} is not described or listed in the documentation. It is not contained in the DataSet API programming guide (https://ci.apache.org/projects/flink/flink-docs-release-0.9/apis/programming_guide.html) and not in the DataSet Transformation (https://ci.apache.org/projects/flink/flink-docs-release-0.9/apis/dataset_transformations.html) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2377) AbstractTestBase.deleteAllTempFiles sometimes fails on Windows
[ https://issues.apache.org/jira/browse/FLINK-2377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635631#comment-14635631 ] ASF GitHub Bot commented on FLINK-2377: --- Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/924 AbstractTestBase.deleteAllTempFiles sometimes fails on Windows -- Key: FLINK-2377 URL: https://issues.apache.org/jira/browse/FLINK-2377 Project: Flink Issue Type: Bug Components: Tests Environment: Windows Reporter: Gabor Gevay Priority: Minor This is probably another file closing issue. (that is, Windows won't delete open files, as opposed to Linux) I have encountered two concrete tests so far where this actually appears: CsvOutputFormatITCase and CollectionSourceTest. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2377] Add reader.close() to readAllResu...
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/924 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2376] Raise the time limit in testFindC...
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/920 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1938] Add Grunt for building the front-...
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/623 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1938) Add Grunt for building the front-end
[ https://issues.apache.org/jira/browse/FLINK-1938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635636#comment-14635636 ] ASF GitHub Bot commented on FLINK-1938: --- Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/623 Add Grunt for building the front-end Key: FLINK-1938 URL: https://issues.apache.org/jira/browse/FLINK-1938 Project: Flink Issue Type: Improvement Components: Build System, Webfrontend Reporter: Vikhyat Korrapati Priority: Minor This is the first step towards implementing the web interface refactoring I proposed last year: https://groups.google.com/forum/#!topic/stratosphere-dev/GeXmDXF9DOY Once this is merged, I can get started with the rest of the refactoring. For now, the actual interface is kept the same, the only change is to how the build is done. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (FLINK-2377) AbstractTestBase.deleteAllTempFiles sometimes fails on Windows
[ https://issues.apache.org/jira/browse/FLINK-2377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Ewen resolved FLINK-2377. - Resolution: Fixed Assignee: Gabor Gevay Fix Version/s: 0.10 Fixed via 29454de57edeeeb8b7ec404bfb8c08cba6e45781 AbstractTestBase.deleteAllTempFiles sometimes fails on Windows -- Key: FLINK-2377 URL: https://issues.apache.org/jira/browse/FLINK-2377 Project: Flink Issue Type: Bug Components: Tests Environment: Windows Reporter: Gabor Gevay Assignee: Gabor Gevay Priority: Minor Fix For: 0.10 This is probably another file closing issue. (that is, Windows won't delete open files, as opposed to Linux) I have encountered two concrete tests so far where this actually appears: CsvOutputFormatITCase and CollectionSourceTest. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: FLINK-2018 Add ParameterUtil.fromGenericOption...
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/720 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2374] Under Windows, skip tests that us...
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/919 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2362] - distinct is missing in DataSet ...
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/922 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1228] Add REST Interface to JobManager
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/297 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1943] [gelly] Added GSA compiler and tr...
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/916 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-297] [web frontend] First part of JobMa...
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/677 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [docs] Document some missing configuration par...
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/915 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Resolved] (FLINK-2362) distinct is missing in DataSet API documentation
[ https://issues.apache.org/jira/browse/FLINK-2362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Ewen resolved FLINK-2362. - Resolution: Fixed Fixed in 0.10 via 0818bec02c0bc90bae68f09d8d2e48dc0f6b9374 distinct is missing in DataSet API documentation Key: FLINK-2362 URL: https://issues.apache.org/jira/browse/FLINK-2362 Project: Flink Issue Type: Bug Components: Documentation, Java API, Scala API Affects Versions: 0.9, 0.10 Reporter: Fabian Hueske Assignee: pietro pinoli Fix For: 0.10, 0.9.1 The DataSet transformation {{distinct}} is not described or listed in the documentation. It is not contained in the DataSet API programming guide (https://ci.apache.org/projects/flink/flink-docs-release-0.9/apis/programming_guide.html) and not in the DataSet Transformation (https://ci.apache.org/projects/flink/flink-docs-release-0.9/apis/dataset_transformations.html) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-297) Redesign GUI client-server model
[ https://issues.apache.org/jira/browse/FLINK-297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635632#comment-14635632 ] ASF GitHub Bot commented on FLINK-297: -- Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/677 Redesign GUI client-server model Key: FLINK-297 URL: https://issues.apache.org/jira/browse/FLINK-297 Project: Flink Issue Type: Improvement Reporter: GitHub Import Labels: github-import Fix For: pre-apache Factor out job manager status information as REST service running inside the same process. Implement visualization server as a separate web application that runs on the client-side and renders data fetched from from the job manager RESTful API. Imported from GitHub Url: https://github.com/stratosphere/stratosphere/issues/297 Created by: [aalexandrov|https://github.com/aalexandrov] Labels: enhancement, gui, Created at: Tue Nov 26 14:54:53 CET 2013 State: open -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1943) Add Gelly-GSA compiler and translation tests
[ https://issues.apache.org/jira/browse/FLINK-1943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635635#comment-14635635 ] ASF GitHub Bot commented on FLINK-1943: --- Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/916 Add Gelly-GSA compiler and translation tests Key: FLINK-1943 URL: https://issues.apache.org/jira/browse/FLINK-1943 Project: Flink Issue Type: Test Components: Gelly Affects Versions: 0.9 Reporter: Vasia Kalavri Assignee: Vasia Kalavri These should be similar to the corresponding Spargel tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2374) File.setWritable doesn't work on Windows, which makes several tests fail
[ https://issues.apache.org/jira/browse/FLINK-2374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635634#comment-14635634 ] ASF GitHub Bot commented on FLINK-2374: --- Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/919 File.setWritable doesn't work on Windows, which makes several tests fail Key: FLINK-2374 URL: https://issues.apache.org/jira/browse/FLINK-2374 Project: Flink Issue Type: Bug Environment: Windows Reporter: Gabor Gevay Assignee: Gabor Gevay Priority: Trivial The tests that use setWritable to test the handling of certain error conditions should be skipped in case we are under Windows. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1658) Rename AbstractEvent to AbstractTaskEvent and AbstractJobEvent
[ https://issues.apache.org/jira/browse/FLINK-1658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635862#comment-14635862 ] ASF GitHub Bot commented on FLINK-1658: --- GitHub user mjsax opened a pull request: https://github.com/apache/flink/pull/929 [FLINK-1658] Rename AbstractEvent to AbstractTaskEvent and AbstractJobEvent - deleted AbstractEvent in org.apache.flink.runtime.event.job (see comment section in https://issues.apache.org/jira/browse/FLINK-1658) You can merge this pull request into a Git repository by running: $ git pull https://github.com/mjsax/flink flink-1658-AbstractEvent Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/929.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #929 commit 3cc8e3427fbfb61690417d6d1ee62745e2018662 Author: mjsax mj...@informatik.hu-berlin.de Date: 2015-07-21T19:54:08Z [FLINK-1658] Rename AbstractEvent to AbstractTaskEvent and AbstractJobEvent - deleted AbstractEvent in org.apache.flink.runtime.event.job (see comment section in https://issues.apache.org/jira/browse/FLINK-1658) Rename AbstractEvent to AbstractTaskEvent and AbstractJobEvent -- Key: FLINK-1658 URL: https://issues.apache.org/jira/browse/FLINK-1658 Project: Flink Issue Type: Improvement Components: Distributed Runtime, Local Runtime Reporter: Gyula Fora Assignee: Matthias J. Sax Priority: Trivial The same name is used for different event classes in the runtime which can cause confusion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-2390) Replace iteration timeout with algorithm for detecting termination
Gyula Fora created FLINK-2390: - Summary: Replace iteration timeout with algorithm for detecting termination Key: FLINK-2390 URL: https://issues.apache.org/jira/browse/FLINK-2390 Project: Flink Issue Type: New Feature Components: Streaming Reporter: Gyula Fora Fix For: 0.10 Currently the user can set a timeout which will shut down the iteration source/sink nodes if no new data is received during that time to allow program termination in iterative streaming jobs. This method is used due to the non-trivial nature of termination in iterative streaming jobs. While termination is not a main concern in long running streaming jobs, this behaviour makes iterative tests non-deterministic and they often fail on travis due to the timeout. Also setting a timeout can cause jobs to terminate prematurely. I propose to remove iteration timeouts and replace it with the following algorithm for detecting termination: -We first identify loop edges in the jobgraph (the channels from the iteration sources to the head operators) -Once the head operators (the ones with loop input) finish with all their non-loop inputs they broadcast a marker to their outputs. -Each operator will broadcast a marker once it received a marker from all its non-finished inputs -Iteration sources are terminated when they receive 2 consecutive markers without receiving any record in-between The idea behind the algorithm is to find out when no more outputs are generated from the operators inside an iteration after their normal inputs are finished. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2363) Add an end-to-end overview of program execution in Flink to the docs
[ https://issues.apache.org/jira/browse/FLINK-2363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635786#comment-14635786 ] ASF GitHub Bot commented on FLINK-2363: --- Github user fhueske commented on the pull request: https://github.com/apache/flink/pull/913#issuecomment-123472145 Hi @aljoscha, I haven't started and could do it but you can also give it a shot if you'd like to :smile: Add an end-to-end overview of program execution in Flink to the docs Key: FLINK-2363 URL: https://issues.apache.org/jira/browse/FLINK-2363 Project: Flink Issue Type: Improvement Components: Documentation Reporter: Stephan Ewen -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1658] Rename AbstractEvent to AbstractT...
GitHub user mjsax opened a pull request: https://github.com/apache/flink/pull/929 [FLINK-1658] Rename AbstractEvent to AbstractTaskEvent and AbstractJobEvent - deleted AbstractEvent in org.apache.flink.runtime.event.job (see comment section in https://issues.apache.org/jira/browse/FLINK-1658) You can merge this pull request into a Git repository by running: $ git pull https://github.com/mjsax/flink flink-1658-AbstractEvent Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/929.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #929 commit 3cc8e3427fbfb61690417d6d1ee62745e2018662 Author: mjsax mj...@informatik.hu-berlin.de Date: 2015-07-21T19:54:08Z [FLINK-1658] Rename AbstractEvent to AbstractTaskEvent and AbstractJobEvent - deleted AbstractEvent in org.apache.flink.runtime.event.job (see comment section in https://issues.apache.org/jira/browse/FLINK-1658) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2304) Add named attribute access to Storm compatibility layer
[ https://issues.apache.org/jira/browse/FLINK-2304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635883#comment-14635883 ] ASF GitHub Bot commented on FLINK-2304: --- Github user mjsax commented on the pull request: https://github.com/apache/flink/pull/878#issuecomment-123489145 Just rebased because #879 was merged today. This PR should be ready to get merged now. Add named attribute access to Storm compatibility layer --- Key: FLINK-2304 URL: https://issues.apache.org/jira/browse/FLINK-2304 Project: Flink Issue Type: Improvement Components: flink-contrib Reporter: Matthias J. Sax Assignee: Matthias J. Sax Priority: Minor Currently, Bolts running in Flink can access fields only by index. Enabling named index access is possible for whole topologies and Tuple-type as well as POJO type inputs for embedded Bolts. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (FLINK-1267) Add crossGroup operator
[ https://issues.apache.org/jira/browse/FLINK-1267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] pietro pinoli reassigned FLINK-1267: Assignee: pietro pinoli Add crossGroup operator --- Key: FLINK-1267 URL: https://issues.apache.org/jira/browse/FLINK-1267 Project: Flink Issue Type: New Feature Components: Java API, Local Runtime, Optimizer, Scala API Affects Versions: 0.7.0-incubating Reporter: Fabian Hueske Assignee: pietro pinoli Priority: Minor A common operator is to pair-wise compare or combine all elements of a group (there were two questions about this on the user mailing list, recently). Right now, this can be done in two ways: 1. {{groupReduce}}: consume and store the complete iterator in memory and build all pairs 2. do a self-{{Join}}: the engine builds all pairs of the full symmetric Cartesian product. Both approaches have drawbacks. The {{groupReduce}} variant requires that the full group fits into memory and is more cumbersome to implement for the user, but pairs can be arbitrarily built. The self-{{Join}} approach pushes most of the work into the system, but the execution strategy does not treat the self-join different from a regular join (both identical inputs are shuffled, etc.) and always builds the full symmetric Cartesian product. I propose to add a dedicated {{crossGroup()}} operator, that offers this functionality in a proper way. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2371) AccumulatorLiveITCase fails
[ https://issues.apache.org/jira/browse/FLINK-2371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634755#comment-14634755 ] ASF GitHub Bot commented on FLINK-2371: --- Github user tillrohrmann commented on a diff in the pull request: https://github.com/apache/flink/pull/925#discussion_r35079328 --- Diff: flink-tests/src/test/java/org/apache/flink/test/accumulators/AccumulatorLiveITCase.java --- @@ -109,125 +123,157 @@ public void testProgram() throws Exception { ExecutionEnvironment env = new PlanExtractor(); DataSetString input = env.fromCollection(inputData); input - .flatMap(new Tokenizer()) .flatMap(new WaitingUDF()) .output(new WaitingOutputFormat()); env.execute(); - /** Extract job graph **/ + // Extract job graph and set job id for the task to notify of accumulator changes. JobGraph jobGraph = getOptimizedPlan(((PlanExtractor) env).plan); + jobID = jobGraph.getJobID(); + + // register for accumulator changes + jobManager.tell(new TestingJobManagerMessages.NotifyWhenAccumulatorChange(jobID), getRef()); + expectMsgEquals(true); --- End diff -- When you wait for messages then this should always happen within a `within` block. Otherwise this might never terminate in a failure case. AccumulatorLiveITCase fails --- Key: FLINK-2371 URL: https://issues.apache.org/jira/browse/FLINK-2371 Project: Flink Issue Type: Bug Components: Tests Reporter: Matthias J. Sax Assignee: Maximilian Michels AccumulatorLiveITCase fails regularly (however, not in each run). The tests relies on timing (via sleep) which does not work well on Travis. See dev-list for more details: https://mail-archives.apache.org/mod_mbox/flink-dev/201507.mbox/browser AccumulatorLiveITCase.testProgram:106-access$1100:68-checkFlinkAccumulators:189 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2371) AccumulatorLiveITCase fails
[ https://issues.apache.org/jira/browse/FLINK-2371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634756#comment-14634756 ] ASF GitHub Bot commented on FLINK-2371: --- Github user tillrohrmann commented on a diff in the pull request: https://github.com/apache/flink/pull/925#discussion_r35079360 --- Diff: flink-tests/src/test/java/org/apache/flink/test/accumulators/AccumulatorLiveITCase.java --- @@ -109,125 +123,157 @@ public void testProgram() throws Exception { ExecutionEnvironment env = new PlanExtractor(); DataSetString input = env.fromCollection(inputData); input - .flatMap(new Tokenizer()) .flatMap(new WaitingUDF()) .output(new WaitingOutputFormat()); env.execute(); - /** Extract job graph **/ + // Extract job graph and set job id for the task to notify of accumulator changes. JobGraph jobGraph = getOptimizedPlan(((PlanExtractor) env).plan); + jobID = jobGraph.getJobID(); + + // register for accumulator changes + jobManager.tell(new TestingJobManagerMessages.NotifyWhenAccumulatorChange(jobID), getRef()); + expectMsgEquals(true); + // submit job jobManager.tell(new JobManagerMessages.SubmitJob(jobGraph, false), getRef()); expectMsgClass(Status.Success.class); --- End diff -- Same here. Best if you nest all actor interaction in a `within` block. AccumulatorLiveITCase fails --- Key: FLINK-2371 URL: https://issues.apache.org/jira/browse/FLINK-2371 Project: Flink Issue Type: Bug Components: Tests Reporter: Matthias J. Sax Assignee: Maximilian Michels AccumulatorLiveITCase fails regularly (however, not in each run). The tests relies on timing (via sleep) which does not work well on Travis. See dev-list for more details: https://mail-archives.apache.org/mod_mbox/flink-dev/201507.mbox/browser AccumulatorLiveITCase.testProgram:106-access$1100:68-checkFlinkAccumulators:189 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2371) AccumulatorLiveITCase fails
[ https://issues.apache.org/jira/browse/FLINK-2371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634763#comment-14634763 ] ASF GitHub Bot commented on FLINK-2371: --- Github user tillrohrmann commented on a diff in the pull request: https://github.com/apache/flink/pull/925#discussion_r35079637 --- Diff: flink-runtime/src/main/scala/org/apache/flink/runtime/jobmanager/JobManager.scala --- @@ -400,20 +400,16 @@ class JobManager( import scala.collection.JavaConverters._ sender ! RegisteredTaskManagers(instanceManager.getAllRegisteredInstances.asScala) -case Heartbeat(instanceID, metricsReport, accumulators) = +case Heartbeat(instanceID, metricsReport, accumulators, asyncAccumulatorUpdate) = log.debug(sReceived hearbeat message from $instanceID.) - Future { -accumulators foreach { - case accumulators = - currentJobs.get(accumulators.getJobID) match { -case Some((jobGraph, jobInfo)) = - jobGraph.updateAccumulators(accumulators) -case None = - // ignore accumulator values for old job - } -} - }(context.dispatcher) + if (asyncAccumulatorUpdate) { --- End diff -- Mixing test code in your production code is IMO bad style. If you need synchronous execution, then start the actor with a `CallingThreadDispatcher` or overwrite this message in the `TestingJobManager`. AccumulatorLiveITCase fails --- Key: FLINK-2371 URL: https://issues.apache.org/jira/browse/FLINK-2371 Project: Flink Issue Type: Bug Components: Tests Reporter: Matthias J. Sax Assignee: Maximilian Michels AccumulatorLiveITCase fails regularly (however, not in each run). The tests relies on timing (via sleep) which does not work well on Travis. See dev-list for more details: https://mail-archives.apache.org/mod_mbox/flink-dev/201507.mbox/browser AccumulatorLiveITCase.testProgram:106-access$1100:68-checkFlinkAccumulators:189 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1901) Create sample operator for Dataset
[ https://issues.apache.org/jira/browse/FLINK-1901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634622#comment-14634622 ] Chengxiang Li commented on FLINK-1901: -- To randomly choose a sample from a DataSet S, basically, there exists two kinds of sample requirement: sampling with factor(such as randomly choose 5% percent items in S) and sampling with fixed size(such as randomly choose 100 items from S). Besides, we do not know the size of S, unless we take extra cost to computer it through DataSet::count(). # Sampling with factor #* With replacement: the expected sample size follow [Poisson Distribution|https://en.wikipedia.org/wiki/Poisson_distribution] in this case, so Poisson Sampling can be used to choose the sample items. #* Without replacement: during sampling, we can take the sample of each item in iterator as a [Bernoulli Trial|https://en.wikipedia.org/wiki/Bernoulli_trial]. # Sampling with fixed size #* Use DataSet::count() to get the dataset size, with the fixed size, we can turn this into sampling with factor. #* [Reservoir Sampling|https://en.wikipedia.org/wiki/Reservoir_sampling] is another commonly used algorithm to randomly choose a sample of k items from a list S containing n items, where n is either a very large or unknown number, and there are different reservoir sampling algorithms that support reservoir support both sampling with replacement and sampling without replacement. Create sample operator for Dataset -- Key: FLINK-1901 URL: https://issues.apache.org/jira/browse/FLINK-1901 Project: Flink Issue Type: Improvement Components: Core Reporter: Theodore Vasiloudis In order to be able to implement Stochastic Gradient Descent and a number of other machine learning algorithms we need to have a way to take a random sample from a Dataset. We need to be able to sample with or without replacement from the Dataset, choose the relative size of the sample, and set a seed for reproducibility. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2371) AccumulatorLiveITCase fails
[ https://issues.apache.org/jira/browse/FLINK-2371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634766#comment-14634766 ] ASF GitHub Bot commented on FLINK-2371: --- Github user tillrohrmann commented on the pull request: https://github.com/apache/flink/pull/925#issuecomment-123212399 Is it bad that the accumulators are not serialized in a local setup? If not and you only want to test the serialization in a distributed setting, then you can also simply start the testing cluster with multiple `ActorSystems`. This should enforce serialization, if I'm not mistaken. AccumulatorLiveITCase fails --- Key: FLINK-2371 URL: https://issues.apache.org/jira/browse/FLINK-2371 Project: Flink Issue Type: Bug Components: Tests Reporter: Matthias J. Sax Assignee: Maximilian Michels AccumulatorLiveITCase fails regularly (however, not in each run). The tests relies on timing (via sleep) which does not work well on Travis. See dev-list for more details: https://mail-archives.apache.org/mod_mbox/flink-dev/201507.mbox/browser AccumulatorLiveITCase.testProgram:106-access$1100:68-checkFlinkAccumulators:189 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2371] improve AccumulatorLiveITCase
Github user tillrohrmann commented on the pull request: https://github.com/apache/flink/pull/925#issuecomment-123212399 Is it bad that the accumulators are not serialized in a local setup? If not and you only want to test the serialization in a distributed setting, then you can also simply start the testing cluster with multiple `ActorSystems`. This should enforce serialization, if I'm not mistaken. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2371] improve AccumulatorLiveITCase
Github user mxm commented on the pull request: https://github.com/apache/flink/pull/925#issuecomment-123213630 In the master, I only serialize the user-defined accumulators but not the Flink accumulators. As of this pull request, the accumulators are always serialized even in local setups. Starting multiple actor systems did not solve the problem because Akka always optimizes message passing in local one VM setups, i.e. it does not serialize objects even across actor systems. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2371] improve AccumulatorLiveITCase
Github user tillrohrmann commented on a diff in the pull request: https://github.com/apache/flink/pull/925#discussion_r35079360 --- Diff: flink-tests/src/test/java/org/apache/flink/test/accumulators/AccumulatorLiveITCase.java --- @@ -109,125 +123,157 @@ public void testProgram() throws Exception { ExecutionEnvironment env = new PlanExtractor(); DataSetString input = env.fromCollection(inputData); input - .flatMap(new Tokenizer()) .flatMap(new WaitingUDF()) .output(new WaitingOutputFormat()); env.execute(); - /** Extract job graph **/ + // Extract job graph and set job id for the task to notify of accumulator changes. JobGraph jobGraph = getOptimizedPlan(((PlanExtractor) env).plan); + jobID = jobGraph.getJobID(); + + // register for accumulator changes + jobManager.tell(new TestingJobManagerMessages.NotifyWhenAccumulatorChange(jobID), getRef()); + expectMsgEquals(true); + // submit job jobManager.tell(new JobManagerMessages.SubmitJob(jobGraph, false), getRef()); expectMsgClass(Status.Success.class); --- End diff -- Same here. Best if you nest all actor interaction in a `within` block. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2371] improve AccumulatorLiveITCase
Github user tillrohrmann commented on a diff in the pull request: https://github.com/apache/flink/pull/925#discussion_r35079328 --- Diff: flink-tests/src/test/java/org/apache/flink/test/accumulators/AccumulatorLiveITCase.java --- @@ -109,125 +123,157 @@ public void testProgram() throws Exception { ExecutionEnvironment env = new PlanExtractor(); DataSetString input = env.fromCollection(inputData); input - .flatMap(new Tokenizer()) .flatMap(new WaitingUDF()) .output(new WaitingOutputFormat()); env.execute(); - /** Extract job graph **/ + // Extract job graph and set job id for the task to notify of accumulator changes. JobGraph jobGraph = getOptimizedPlan(((PlanExtractor) env).plan); + jobID = jobGraph.getJobID(); + + // register for accumulator changes + jobManager.tell(new TestingJobManagerMessages.NotifyWhenAccumulatorChange(jobID), getRef()); + expectMsgEquals(true); --- End diff -- When you wait for messages then this should always happen within a `within` block. Otherwise this might never terminate in a failure case. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2371] improve AccumulatorLiveITCase
Github user tillrohrmann commented on a diff in the pull request: https://github.com/apache/flink/pull/925#discussion_r35079637 --- Diff: flink-runtime/src/main/scala/org/apache/flink/runtime/jobmanager/JobManager.scala --- @@ -400,20 +400,16 @@ class JobManager( import scala.collection.JavaConverters._ sender ! RegisteredTaskManagers(instanceManager.getAllRegisteredInstances.asScala) -case Heartbeat(instanceID, metricsReport, accumulators) = +case Heartbeat(instanceID, metricsReport, accumulators, asyncAccumulatorUpdate) = log.debug(sReceived hearbeat message from $instanceID.) - Future { -accumulators foreach { - case accumulators = - currentJobs.get(accumulators.getJobID) match { -case Some((jobGraph, jobInfo)) = - jobGraph.updateAccumulators(accumulators) -case None = - // ignore accumulator values for old job - } -} - }(context.dispatcher) + if (asyncAccumulatorUpdate) { --- End diff -- Mixing test code in your production code is IMO bad style. If you need synchronous execution, then start the actor with a `CallingThreadDispatcher` or overwrite this message in the `TestingJobManager`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2371) AccumulatorLiveITCase fails
[ https://issues.apache.org/jira/browse/FLINK-2371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634771#comment-14634771 ] ASF GitHub Bot commented on FLINK-2371: --- Github user mxm commented on the pull request: https://github.com/apache/flink/pull/925#issuecomment-123213630 In the master, I only serialize the user-defined accumulators but not the Flink accumulators. As of this pull request, the accumulators are always serialized even in local setups. Starting multiple actor systems did not solve the problem because Akka always optimizes message passing in local one VM setups, i.e. it does not serialize objects even across actor systems. AccumulatorLiveITCase fails --- Key: FLINK-2371 URL: https://issues.apache.org/jira/browse/FLINK-2371 Project: Flink Issue Type: Bug Components: Tests Reporter: Matthias J. Sax Assignee: Maximilian Michels AccumulatorLiveITCase fails regularly (however, not in each run). The tests relies on timing (via sleep) which does not work well on Travis. See dev-list for more details: https://mail-archives.apache.org/mod_mbox/flink-dev/201507.mbox/browser AccumulatorLiveITCase.testProgram:106-access$1100:68-checkFlinkAccumulators:189 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2371] improve AccumulatorLiveITCase
Github user tillrohrmann commented on a diff in the pull request: https://github.com/apache/flink/pull/925#discussion_r35086920 --- Diff: flink-runtime/src/main/scala/org/apache/flink/runtime/minicluster/FlinkMiniCluster.scala --- @@ -51,10 +51,13 @@ import scala.concurrent.{ExecutionContext, Future, Await} abstract class FlinkMiniCluster( val userConfiguration: Configuration, val singleActorSystem: Boolean, +val synchronousDispatcher: Boolean, val streamingMode: StreamingMode) { - def this(userConfiguration: Configuration, singleActorSystem: Boolean) - = this(userConfiguration, singleActorSystem, StreamingMode.BATCH_ONLY) + def this(userConfiguration: Configuration, + singleActorSystem: Boolean, + synchronousDispatcher: Boolean) + = this(userConfiguration, singleActorSystem, synchronousDispatcher, StreamingMode.BATCH_ONLY) --- End diff -- I think the field `synchronousDispatcher` should not be part of the `FlinkMiniCluster`. The field seems to be only used for testing, referenced in `TestingCluster`, but the `FlinkMiniCluster` is also used to execute Flink programs locally. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2371) AccumulatorLiveITCase fails
[ https://issues.apache.org/jira/browse/FLINK-2371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634880#comment-14634880 ] ASF GitHub Bot commented on FLINK-2371: --- Github user tillrohrmann commented on a diff in the pull request: https://github.com/apache/flink/pull/925#discussion_r35086920 --- Diff: flink-runtime/src/main/scala/org/apache/flink/runtime/minicluster/FlinkMiniCluster.scala --- @@ -51,10 +51,13 @@ import scala.concurrent.{ExecutionContext, Future, Await} abstract class FlinkMiniCluster( val userConfiguration: Configuration, val singleActorSystem: Boolean, +val synchronousDispatcher: Boolean, val streamingMode: StreamingMode) { - def this(userConfiguration: Configuration, singleActorSystem: Boolean) - = this(userConfiguration, singleActorSystem, StreamingMode.BATCH_ONLY) + def this(userConfiguration: Configuration, + singleActorSystem: Boolean, + synchronousDispatcher: Boolean) + = this(userConfiguration, singleActorSystem, synchronousDispatcher, StreamingMode.BATCH_ONLY) --- End diff -- I think the field `synchronousDispatcher` should not be part of the `FlinkMiniCluster`. The field seems to be only used for testing, referenced in `TestingCluster`, but the `FlinkMiniCluster` is also used to execute Flink programs locally. AccumulatorLiveITCase fails --- Key: FLINK-2371 URL: https://issues.apache.org/jira/browse/FLINK-2371 Project: Flink Issue Type: Bug Components: Tests Reporter: Matthias J. Sax Assignee: Maximilian Michels AccumulatorLiveITCase fails regularly (however, not in each run). The tests relies on timing (via sleep) which does not work well on Travis. See dev-list for more details: https://mail-archives.apache.org/mod_mbox/flink-dev/201507.mbox/browser AccumulatorLiveITCase.testProgram:106-access$1100:68-checkFlinkAccumulators:189 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2371) AccumulatorLiveITCase fails
[ https://issues.apache.org/jira/browse/FLINK-2371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634902#comment-14634902 ] ASF GitHub Bot commented on FLINK-2371: --- Github user mxm commented on a diff in the pull request: https://github.com/apache/flink/pull/925#discussion_r35087953 --- Diff: flink-runtime/src/main/scala/org/apache/flink/runtime/minicluster/FlinkMiniCluster.scala --- @@ -51,10 +51,13 @@ import scala.concurrent.{ExecutionContext, Future, Await} abstract class FlinkMiniCluster( val userConfiguration: Configuration, val singleActorSystem: Boolean, +val synchronousDispatcher: Boolean, val streamingMode: StreamingMode) { - def this(userConfiguration: Configuration, singleActorSystem: Boolean) - = this(userConfiguration, singleActorSystem, StreamingMode.BATCH_ONLY) + def this(userConfiguration: Configuration, + singleActorSystem: Boolean, + synchronousDispatcher: Boolean) + = this(userConfiguration, singleActorSystem, synchronousDispatcher, StreamingMode.BATCH_ONLY) --- End diff -- Good catch. I noticed that as well and already changed that before I saw your answer. Now only the `TestingCluster` is affected by the changes. AccumulatorLiveITCase fails --- Key: FLINK-2371 URL: https://issues.apache.org/jira/browse/FLINK-2371 Project: Flink Issue Type: Bug Components: Tests Reporter: Matthias J. Sax Assignee: Maximilian Michels AccumulatorLiveITCase fails regularly (however, not in each run). The tests relies on timing (via sleep) which does not work well on Travis. See dev-list for more details: https://mail-archives.apache.org/mod_mbox/flink-dev/201507.mbox/browser AccumulatorLiveITCase.testProgram:106-access$1100:68-checkFlinkAccumulators:189 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2371] improve AccumulatorLiveITCase
Github user mxm commented on a diff in the pull request: https://github.com/apache/flink/pull/925#discussion_r35087953 --- Diff: flink-runtime/src/main/scala/org/apache/flink/runtime/minicluster/FlinkMiniCluster.scala --- @@ -51,10 +51,13 @@ import scala.concurrent.{ExecutionContext, Future, Await} abstract class FlinkMiniCluster( val userConfiguration: Configuration, val singleActorSystem: Boolean, +val synchronousDispatcher: Boolean, val streamingMode: StreamingMode) { - def this(userConfiguration: Configuration, singleActorSystem: Boolean) - = this(userConfiguration, singleActorSystem, StreamingMode.BATCH_ONLY) + def this(userConfiguration: Configuration, + singleActorSystem: Boolean, + synchronousDispatcher: Boolean) + = this(userConfiguration, singleActorSystem, synchronousDispatcher, StreamingMode.BATCH_ONLY) --- End diff -- Good catch. I noticed that as well and already changed that before I saw your answer. Now only the `TestingCluster` is affected by the changes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1967] Introduce (Event)time in Streamin...
Github user aljoscha closed the pull request at: https://github.com/apache/flink/pull/879 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1967] Introduce (Event)time in Streamin...
Github user aljoscha commented on the pull request: https://github.com/apache/flink/pull/906#issuecomment-123260957 Manually merged --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Resolved] (FLINK-2295) TwoInput Task do not react to/forward checkpoint barriers
[ https://issues.apache.org/jira/browse/FLINK-2295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aljoscha Krettek resolved FLINK-2295. - Resolution: Fixed Fix Version/s: 0.10 Fixed in a2eb6cc8774ab43475829b0b691e62739fbbe88b TwoInput Task do not react to/forward checkpoint barriers - Key: FLINK-2295 URL: https://issues.apache.org/jira/browse/FLINK-2295 Project: Flink Issue Type: Bug Components: Streaming Affects Versions: 0.9, 0.10, 0.9.1 Reporter: Aljoscha Krettek Assignee: Aljoscha Krettek Priority: Blocker Fix For: 0.10 The event listener for the checkpoint barriers was never enabled for TwoInput tasks. I have a fix for it and also tests that verify that it actually works. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1967) Introduce (Event)time in Streaming
[ https://issues.apache.org/jira/browse/FLINK-1967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634934#comment-14634934 ] ASF GitHub Bot commented on FLINK-1967: --- Github user aljoscha commented on the pull request: https://github.com/apache/flink/pull/906#issuecomment-123260957 Manually merged Introduce (Event)time in Streaming -- Key: FLINK-1967 URL: https://issues.apache.org/jira/browse/FLINK-1967 Project: Flink Issue Type: Improvement Reporter: Aljoscha Krettek Assignee: Aljoscha Krettek This requires introducing a timestamp in streaming record and a change in the sources to add timestamps to records. This will also introduce punctuations (or low watermarks) to allow windows to work correctly on unordered, timestamped input data. In the process of this, the windowing subsystem also needs to be adapted to use the punctuations. Furthermore, all operators need to be made aware of punctuations and correctly forward them. Then, a new operator must be introduced to to allow modification of timestamps. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1967) Introduce (Event)time in Streaming
[ https://issues.apache.org/jira/browse/FLINK-1967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634937#comment-14634937 ] ASF GitHub Bot commented on FLINK-1967: --- Github user aljoscha closed the pull request at: https://github.com/apache/flink/pull/879 Introduce (Event)time in Streaming -- Key: FLINK-1967 URL: https://issues.apache.org/jira/browse/FLINK-1967 Project: Flink Issue Type: Improvement Reporter: Aljoscha Krettek Assignee: Aljoscha Krettek This requires introducing a timestamp in streaming record and a change in the sources to add timestamps to records. This will also introduce punctuations (or low watermarks) to allow windows to work correctly on unordered, timestamped input data. In the process of this, the windowing subsystem also needs to be adapted to use the punctuations. Furthermore, all operators need to be made aware of punctuations and correctly forward them. Then, a new operator must be introduced to to allow modification of timestamps. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (FLINK-2301) In BarrierBuffer newer Barriers trigger old Checkpoints
[ https://issues.apache.org/jira/browse/FLINK-2301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aljoscha Krettek resolved FLINK-2301. - Resolution: Fixed Fix Version/s: 0.10 Fixed in a2eb6cc8774ab43475829b0b691e62739fbbe88b In BarrierBuffer newer Barriers trigger old Checkpoints --- Key: FLINK-2301 URL: https://issues.apache.org/jira/browse/FLINK-2301 Project: Flink Issue Type: Bug Components: Streaming Reporter: Aljoscha Krettek Assignee: Aljoscha Krettek Fix For: 0.10 When the BarrierBuffer has some inputs blocked on barrier 0, then receives barriers for barrier 1 on the other inputs this makes the BarrierBuffer process the checkpoint with id 0. I think the BarrierBuffer should drop all previous BarrierCheckpoints when it receives a barrier from a more recent checkpoint and unblock the previously blocked channels. This will make it ready to correctly react to the other barriers of the newer checkpoint. It should also ignore barriers that arrive late when we already processed a more recent checkpoint. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1967] Introduce (Event)time in Streamin...
Github user aljoscha closed the pull request at: https://github.com/apache/flink/pull/906 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Closed] (FLINK-2290) CoRecordReader Does Not Read Events From Both Inputs When No Elements Arrive
[ https://issues.apache.org/jira/browse/FLINK-2290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aljoscha Krettek closed FLINK-2290. --- Resolution: Fixed Fix Version/s: 0.10 Fixed in a2eb6cc8774ab43475829b0b691e62739fbbe88b CoRecordReader Does Not Read Events From Both Inputs When No Elements Arrive Key: FLINK-2290 URL: https://issues.apache.org/jira/browse/FLINK-2290 Project: Flink Issue Type: Bug Components: Streaming Reporter: Aljoscha Krettek Assignee: Aljoscha Krettek Priority: Blocker Fix For: 0.10 When no elements arrive the reader will always try to read from the same input index. This means that it will only process elements form this input. This could be problematic with Watermarks/Checkpoint barriers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1967] Introduce (Event)time in Streamin...
Github user aljoscha commented on the pull request: https://github.com/apache/flink/pull/879#issuecomment-123261017 Superseded by #906 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1967) Introduce (Event)time in Streaming
[ https://issues.apache.org/jira/browse/FLINK-1967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634935#comment-14634935 ] ASF GitHub Bot commented on FLINK-1967: --- Github user aljoscha closed the pull request at: https://github.com/apache/flink/pull/906 Introduce (Event)time in Streaming -- Key: FLINK-1967 URL: https://issues.apache.org/jira/browse/FLINK-1967 Project: Flink Issue Type: Improvement Reporter: Aljoscha Krettek Assignee: Aljoscha Krettek This requires introducing a timestamp in streaming record and a change in the sources to add timestamps to records. This will also introduce punctuations (or low watermarks) to allow windows to work correctly on unordered, timestamped input data. In the process of this, the windowing subsystem also needs to be adapted to use the punctuations. Furthermore, all operators need to be made aware of punctuations and correctly forward them. Then, a new operator must be introduced to to allow modification of timestamps. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-1967) Introduce (Event)time in Streaming
[ https://issues.apache.org/jira/browse/FLINK-1967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aljoscha Krettek closed FLINK-1967. --- Resolution: Fixed Implemented in a2eb6cc8774ab43475829b0b691e62739fbbe88b Introduce (Event)time in Streaming -- Key: FLINK-1967 URL: https://issues.apache.org/jira/browse/FLINK-1967 Project: Flink Issue Type: Improvement Reporter: Aljoscha Krettek Assignee: Aljoscha Krettek This requires introducing a timestamp in streaming record and a change in the sources to add timestamps to records. This will also introduce punctuations (or low watermarks) to allow windows to work correctly on unordered, timestamped input data. In the process of this, the windowing subsystem also needs to be adapted to use the punctuations. Furthermore, all operators need to be made aware of punctuations and correctly forward them. Then, a new operator must be introduced to to allow modification of timestamps. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-2295) TwoInput Task do not react to/forward checkpoint barriers
[ https://issues.apache.org/jira/browse/FLINK-2295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aljoscha Krettek closed FLINK-2295. --- TwoInput Task do not react to/forward checkpoint barriers - Key: FLINK-2295 URL: https://issues.apache.org/jira/browse/FLINK-2295 Project: Flink Issue Type: Bug Components: Streaming Affects Versions: 0.9, 0.10, 0.9.1 Reporter: Aljoscha Krettek Assignee: Aljoscha Krettek Priority: Blocker Fix For: 0.10 The event listener for the checkpoint barriers was never enabled for TwoInput tasks. I have a fix for it and also tests that verify that it actually works. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-2301) In BarrierBuffer newer Barriers trigger old Checkpoints
[ https://issues.apache.org/jira/browse/FLINK-2301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aljoscha Krettek closed FLINK-2301. --- In BarrierBuffer newer Barriers trigger old Checkpoints --- Key: FLINK-2301 URL: https://issues.apache.org/jira/browse/FLINK-2301 Project: Flink Issue Type: Bug Components: Streaming Reporter: Aljoscha Krettek Assignee: Aljoscha Krettek Fix For: 0.10 When the BarrierBuffer has some inputs blocked on barrier 0, then receives barriers for barrier 1 on the other inputs this makes the BarrierBuffer process the checkpoint with id 0. I think the BarrierBuffer should drop all previous BarrierCheckpoints when it receives a barrier from a more recent checkpoint and unblock the previously blocked channels. This will make it ready to correctly react to the other barriers of the newer checkpoint. It should also ignore barriers that arrive late when we already processed a more recent checkpoint. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1967) Introduce (Event)time in Streaming
[ https://issues.apache.org/jira/browse/FLINK-1967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634936#comment-14634936 ] ASF GitHub Bot commented on FLINK-1967: --- Github user aljoscha commented on the pull request: https://github.com/apache/flink/pull/879#issuecomment-123261017 Superseded by #906 Introduce (Event)time in Streaming -- Key: FLINK-1967 URL: https://issues.apache.org/jira/browse/FLINK-1967 Project: Flink Issue Type: Improvement Reporter: Aljoscha Krettek Assignee: Aljoscha Krettek This requires introducing a timestamp in streaming record and a change in the sources to add timestamps to records. This will also introduce punctuations (or low watermarks) to allow windows to work correctly on unordered, timestamped input data. In the process of this, the windowing subsystem also needs to be adapted to use the punctuations. Furthermore, all operators need to be made aware of punctuations and correctly forward them. Then, a new operator must be introduced to to allow modification of timestamps. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1901) Create sample operator for Dataset
[ https://issues.apache.org/jira/browse/FLINK-1901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634664#comment-14634664 ] Till Rohrmann commented on FLINK-1901: -- Hi Chengxiang, good to hear that you want to work in this. I can assign you the ticket. However, it is not only about the sampling strategy but also about the integration within Flink. The reason is that we have to make sure that the sampling operator also works within iterations. This means that it has to be part of the dynamic path so that it is triggered for every iteration again and again. This will need a special operator type. But you can start with the sampling strategies and then continue with the iteration integration. Create sample operator for Dataset -- Key: FLINK-1901 URL: https://issues.apache.org/jira/browse/FLINK-1901 Project: Flink Issue Type: Improvement Components: Core Reporter: Theodore Vasiloudis In order to be able to implement Stochastic Gradient Descent and a number of other machine learning algorithms we need to have a way to take a random sample from a Dataset. We need to be able to sample with or without replacement from the Dataset, choose the relative size of the sample, and set a seed for reproducibility. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-2383) Add new Window Operator that can use Timestamps/Watermarks
Aljoscha Krettek created FLINK-2383: --- Summary: Add new Window Operator that can use Timestamps/Watermarks Key: FLINK-2383 URL: https://issues.apache.org/jira/browse/FLINK-2383 Project: Flink Issue Type: Improvement Components: Streaming Reporter: Aljoscha Krettek Assignee: Aljoscha Krettek This should mostly be based on the design documents in the Wiki: https://cwiki.apache.org/confluence/display/FLINK/Streams+and+Operations+on+Streams https://cwiki.apache.org/confluence/display/FLINK/Time+and+Order+in+Streams -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (FLINK-1658) Rename AbstractEvent to AbstractTaskEvent and AbstractJobEvent
[ https://issues.apache.org/jira/browse/FLINK-1658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias J. Sax reassigned FLINK-1658: -- Assignee: Matthias J. Sax Rename AbstractEvent to AbstractTaskEvent and AbstractJobEvent -- Key: FLINK-1658 URL: https://issues.apache.org/jira/browse/FLINK-1658 Project: Flink Issue Type: Improvement Components: Distributed Runtime, Local Runtime Reporter: Gyula Fora Assignee: Matthias J. Sax Priority: Trivial The same name is used for different event classes in the runtime which can cause confusion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2371] improve AccumulatorLiveITCase
Github user mxm commented on the pull request: https://github.com/apache/flink/pull/925#issuecomment-123283973 Thanks for the feedback. Merging this later on if there are no objections. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Comment Edited] (FLINK-1658) Rename AbstractEvent to AbstractTaskEvent and AbstractJobEvent
[ https://issues.apache.org/jira/browse/FLINK-1658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635006#comment-14635006 ] Matthias J. Sax edited comment on FLINK-1658 at 7/21/15 12:02 PM: -- I still see `org.apache.flink.runtime.event.job.AbstractEvent`. However, there is no sub-class for it. I guess, it can be deleted. However, there is: - `org.apache.flink.runtime.event.task.AbstractEvent` - `org.apache.flink.runtime.event.task.TaskEvent` (which is actually abstract) - `org.apache.flink.runtime.event.task.RuntimeEvent` (which is actually abstract) Maybe, TaskEvent and RuntimeEvent should be renamed into AbstractTaskEvent and AbstractRuntimeEvent ? However, I just discovered that TaskEvent wan introduced in d908ca19741bf2561cb6a7663541f642e60c0e6d as a renaming of AbstractTaskEvent (for whatever reason an abstract class was renamed to not start with Abstract ?) was (Author: mjsax): I still see `org.apache.flink.runtime.event.job.AbstractEvent`. However, there is no sub-class for it. I guess, it can be deleted. However, there is: - `org.apache.flink.runtime.event.task.AbstractEvent` - `org.apache.flink.runtime.event.task.TaskEvent` (which is actually abstract) - `org.apache.flink.runtime.event.task.RuntimeEvent` (which is actually abstract) Maybe, TaskEvent and RuntimeEvent should be renamed into AbstractTaskEvent and AbstractRuntimeEvent ? Rename AbstractEvent to AbstractTaskEvent and AbstractJobEvent -- Key: FLINK-1658 URL: https://issues.apache.org/jira/browse/FLINK-1658 Project: Flink Issue Type: Improvement Components: Distributed Runtime, Local Runtime Reporter: Gyula Fora Priority: Trivial The same name is used for different event classes in the runtime which can cause confusion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)