[GitHub] flink pull request: [Flink-2030][ml]Data Set Statistics and Histog...
Github user chiwanpark commented on the pull request: https://github.com/apache/flink/pull/861#issuecomment-132128284 +1 for having two different methods by return type but we need more comments from @tillrohrmann, @thvasilo or other people because I'm not sure this is best approach. Would be okay the method names are `createContinuousHistogram` and `createCategoricalHistogram` if we decide create two methods? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2448) registerCacheFile fails with MultipleProgramsTestbase
[ https://issues.apache.org/jira/browse/FLINK-2448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700983#comment-14700983 ] ASF GitHub Bot commented on FLINK-2448: --- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/1031#issuecomment-132136444 I am not sure this is solving it the right way. I find it more intuitive if cache files stay registered. They are part of the environment and should be sticky, like configuration settings as well (they don't get lost after execution). We could add a method to clear registered cache files. But if the problem is really only the MultiplProgramsTest base, then this should be fixed to return a new ExecutionEnvironment from a (the) TestEnvironmentFactory every time you call `getExecutionEnvironment()`. registerCacheFile fails with MultipleProgramsTestbase - Key: FLINK-2448 URL: https://issues.apache.org/jira/browse/FLINK-2448 Project: Flink Issue Type: Bug Components: Tests Reporter: Chesnay Schepler Priority: Minor When trying to register a file using a constant name an expection is thrown saying the file was already cached. This is probably because the same environment is reused, and the cacheFile entries are not cleared between runs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2448]Clear cache file list in Execution...
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/1031#issuecomment-132136444 I am not sure this is solving it the right way. I find it more intuitive if cache files stay registered. They are part of the environment and should be sticky, like configuration settings as well (they don't get lost after execution). We could add a method to clear registered cache files. But if the problem is really only the MultiplProgramsTest base, then this should be fixed to return a new ExecutionEnvironment from a (the) TestEnvironmentFactory every time you call `getExecutionEnvironment()`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2448]Clear cache file list in Execution...
Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/1031#issuecomment-132140770 I'm not sure if I can form an opinion on this. I personally haven't used them at all with Flink, only with Hadoop. A call to clear cache files certainly makes sense though. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2448) registerCacheFile fails with MultipleProgramsTestbase
[ https://issues.apache.org/jira/browse/FLINK-2448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14701000#comment-14701000 ] ASF GitHub Bot commented on FLINK-2448: --- Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/1031#issuecomment-132140770 I'm not sure if I can form an opinion on this. I personally haven't used them at all with Flink, only with Hadoop. A call to clear cache files certainly makes sense though. registerCacheFile fails with MultipleProgramsTestbase - Key: FLINK-2448 URL: https://issues.apache.org/jira/browse/FLINK-2448 Project: Flink Issue Type: Bug Components: Tests Reporter: Chesnay Schepler Priority: Minor When trying to register a file using a constant name an expection is thrown saying the file was already cached. This is probably because the same environment is reused, and the cacheFile entries are not cleared between runs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [Flink-2030][ml]Data Set Statistics and Histog...
Github user thvasilo commented on the pull request: https://github.com/apache/flink/pull/861#issuecomment-132146208 Sounds good. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [flink-2532]fix the function name and the vari...
Github user mbalassi commented on the pull request: https://github.com/apache/flink/pull/1025#issuecomment-132153447 Hey @Rucongzhang, As this is your first contribution to Flink and the PR was issued before a relevant notice on the developer mailing list that advises against merging it [1], I am merging this - but I highlight the fact the Flink community seems to have an agreement that these changes should be mostly rejected in the future. I strongly agree with @fhueske's comment on the mailing list [2] that you find plenty of already open relevant issues on JIRA, I would suggest to pick up one of those. If an issue you like happens to be a bit unclear the community is happy to give you directions. Happy coding Flink. :smile: [1] https://mail-archives.apache.org/mod_mbox/flink-dev/201508.mbox/%3CCANC1h_sTnwSunrjsBY%2BVHrBM2wimkAY7HDKfjM7ZWSx6pYRDFg%40mail.gmail.com%3E [2] https://mail-archives.apache.org/mod_mbox/flink-dev/201508.mbox/%3CCAAdrtT02%3Dk6WNKVeROpqzdW78zn%2Bhv-OX8BD-uQ17S7vtV3QbQ%40mail.gmail.com%3E --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [Flink-2030][ml]Data Set Statistics and Histog...
Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/861#issuecomment-132170286 @thvasilo , @chiwanpark , I've made the required changes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2529) fix on some unused code for flink-runtime
[ https://issues.apache.org/jira/browse/FLINK-2529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14701077#comment-14701077 ] ASF GitHub Bot commented on FLINK-2529: --- Github user HuangWHWHW commented on the pull request: https://github.com/apache/flink/pull/1022#issuecomment-132172991 @StephanEwen Thank you! Do you mean that this PR would not be closed and I can push future checks in this? fix on some unused code for flink-runtime - Key: FLINK-2529 URL: https://issues.apache.org/jira/browse/FLINK-2529 Project: Flink Issue Type: Improvement Components: Local Runtime Affects Versions: 0.10 Reporter: Huang Wei Priority: Minor Fix For: 0.10 Original Estimate: 168h Remaining Estimate: 168h In file BlobServer.java, I found the Thread.currentThread() will never return null in my learned knowledge. So I think shutdownHook != null“ is not necessary in code 'if (shutdownHook != null shutdownHook != Thread.currentThread())'; And I am not complete sure. Maybe I am wrong. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [Flink-2030][ml]Data Set Statistics and Histog...
Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/861#issuecomment-132118352 What about having two different functions? One for Discrete and one for continuous? Or perhaps just one for the Continuous as that is more likely to be used. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Updated] (FLINK-2540) LocalBufferPool.requestBuffer gets into infinite loop
[ https://issues.apache.org/jira/browse/FLINK-2540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Gevay updated FLINK-2540: --- Component/s: Core LocalBufferPool.requestBuffer gets into infinite loop - Key: FLINK-2540 URL: https://issues.apache.org/jira/browse/FLINK-2540 Project: Flink Issue Type: Bug Components: Core Reporter: Gabor Gevay I'm trying to run a complicated computation that looks like this: [1]. One of the DataSource-Filter-Map chains finishes fine, but the other one freezes. Debugging shows that it is spinning in the while loop in LocalBufferPool.requestBuffer. askToRecycle is false. Both numberOfRequestedMemorySegments and currentPoolSize is 128, so it never goes into that if either. This is a stack trace: [2] And here is the code, if you would like to run it: [3]. Unfortunately, I can't make it more minimal, becuase if I remove some operators, the problem disappears. The class to start is malom.Solver. (On first run, it calculates some lookuptables for a few minutes, and puts them into /tmp/movegen) [1] http://compalg.inf.elte.hu/~ggevay/flink/plan.txt [2] http://compalg.inf.elte.hu/~ggevay/flink/stacktrace.txt [3] https://github.com/ggevay/flink/tree/deadlock-malom -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2529][runtime]remove some unused code
Github user HuangWHWHW commented on the pull request: https://github.com/apache/flink/pull/1022#issuecomment-132172991 @StephanEwen Thank you! Do you mean that this PR would not be closed and I can push future checks in this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [Flink-2030][ml]Data Set Statistics and Histog...
Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/861#issuecomment-132134747 Okay. That sounds good enough. Can we make a final decision then? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2529][runtime]remove some unused code
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/1022#issuecomment-132138231 The change in the execution class is good. I'll leave the other check in, as it is a good sanity check against future mistakes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Created] (FLINK-2540) LocalBufferPool.requestBuffer gets into infinite loop
Gabor Gevay created FLINK-2540: -- Summary: LocalBufferPool.requestBuffer gets into infinite loop Key: FLINK-2540 URL: https://issues.apache.org/jira/browse/FLINK-2540 Project: Flink Issue Type: Bug Reporter: Gabor Gevay I'm trying to run a complicated computation that looks like this: [1]. One of the DataSource-Filter-Map chains finishes fine, but the other one freezes. Debugging shows that it is spinning in the while loop in LocalBufferPool.requestBuffer. askToRecycle is false. Both numberOfRequestedMemorySegments and currentPoolSize is 128, so it never goes into that if either. This is a stack trace: [2] And here is the code, if you would like to run it: [3]. Unfortunately, I can't make it more minimal, becuase if I remove some operators, the problem disappears. The class to start is malom.Solver. (On first run, it calculates some lookuptables for a few minutes, and puts them into /tmp/movegen) [1] http://compalg.inf.elte.hu/~ggevay/flink/plan.txt [2] http://compalg.inf.elte.hu/~ggevay/flink/stacktrace.txt [3] https://github.com/ggevay/flink/tree/deadlock-malom -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1901) Create sample operator for Dataset
[ https://issues.apache.org/jira/browse/FLINK-1901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14701026#comment-14701026 ] ASF GitHub Bot commented on FLINK-1901: --- Github user ChengXiangLi commented on the pull request: https://github.com/apache/flink/pull/949#issuecomment-132152536 Hi, @tillrohrmann , do you have sometime to continue to review this PR and help to push its progress? Create sample operator for Dataset -- Key: FLINK-1901 URL: https://issues.apache.org/jira/browse/FLINK-1901 Project: Flink Issue Type: Improvement Components: Core Reporter: Theodore Vasiloudis Assignee: Chengxiang Li In order to be able to implement Stochastic Gradient Descent and a number of other machine learning algorithms we need to have a way to take a random sample from a Dataset. We need to be able to sample with or without replacement from the Dataset, choose the relative or exact size of the sample, set a seed for reproducibility, and support sampling within iterations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1901] [core] Create sample operator for...
Github user ChengXiangLi commented on the pull request: https://github.com/apache/flink/pull/949#issuecomment-132152536 Hi, @tillrohrmann , do you have sometime to continue to review this PR and help to push its progress? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2366) HA Without ZooKeeper
[ https://issues.apache.org/jira/browse/FLINK-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700876#comment-14700876 ] Stephan Ewen commented on FLINK-2366: - The leader election service is pluggable in Flink's runtime. Should be relatively easy to plug in other services. Let's see what users request... HA Without ZooKeeper Key: FLINK-2366 URL: https://issues.apache.org/jira/browse/FLINK-2366 Project: Flink Issue Type: Improvement Reporter: Suminda Dharmasena Priority: Minor Please provide a way to do HA without having to use ZK. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1962] Add Gelly Scala API v2
Github user chiwanpark commented on the pull request: https://github.com/apache/flink/pull/1004#issuecomment-132115823 Looks good to merge except one cosmetic issue. Awesome job! :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1962) Add Gelly Scala API
[ https://issues.apache.org/jira/browse/FLINK-1962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700903#comment-14700903 ] ASF GitHub Bot commented on FLINK-1962: --- Github user chiwanpark commented on the pull request: https://github.com/apache/flink/pull/1004#issuecomment-132115823 Looks good to merge except one cosmetic issue. Awesome job! :) Add Gelly Scala API --- Key: FLINK-1962 URL: https://issues.apache.org/jira/browse/FLINK-1962 Project: Flink Issue Type: Task Components: Gelly, Scala API Affects Versions: 0.9 Reporter: Vasia Kalavri Assignee: PJ Van Aeken -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2477) Add test for SocketClientSink
[ https://issues.apache.org/jira/browse/FLINK-2477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700841#comment-14700841 ] ASF GitHub Bot commented on FLINK-2477: --- Github user HuangWHWHW commented on the pull request: https://github.com/apache/flink/pull/977#issuecomment-132100631 @StephanEwen Hi, I`m really very sorry to bother you again. I just wonder whether this can be merged or I need to do some fix more or close this PR? Add test for SocketClientSink - Key: FLINK-2477 URL: https://issues.apache.org/jira/browse/FLINK-2477 Project: Flink Issue Type: Test Components: Streaming Affects Versions: 0.10 Environment: win7 32bit;linux Reporter: Huang Wei Priority: Minor Fix For: 0.10 Original Estimate: 168h Remaining Estimate: 168h Add some tests for SocketClientSink. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2477][Add]Add tests for SocketClientSin...
Github user HuangWHWHW commented on the pull request: https://github.com/apache/flink/pull/977#issuecomment-132100631 @StephanEwen Hi, I`m really very sorry to bother you again. I just wonder whether this can be merged or I need to do some fix more or close this PR? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2521] [tests] Adds automatic test name ...
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/1015#issuecomment-132118469 +1 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1962] Add Gelly Scala API v2
Github user PieterJanVanAeken commented on the pull request: https://github.com/apache/flink/pull/1004#issuecomment-132118620 I fixed the unneccessary comment and did a quick rebase to keep up with master branch. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2521) Add automatic test name logging for tests
[ https://issues.apache.org/jira/browse/FLINK-2521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700918#comment-14700918 ] ASF GitHub Bot commented on FLINK-2521: --- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/1015#issuecomment-132118469 +1 Add automatic test name logging for tests - Key: FLINK-2521 URL: https://issues.apache.org/jira/browse/FLINK-2521 Project: Flink Issue Type: Improvement Reporter: Till Rohrmann Assignee: Till Rohrmann Priority: Minor When running tests on travis the Flink components log to a file. This is helpful in case of a failed test to retrieve the error. However, the log does not contain the test name and the reason for the failure. Therefore it is difficult to find the log output which corresponds to the failed test. It would be nice to automatically add the test case information to the log. This would ease the debugging process big time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2366) HA Without ZooKeeper
[ https://issues.apache.org/jira/browse/FLINK-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700776#comment-14700776 ] Suminda Dharmasena commented on FLINK-2366: --- Can you consider something more simpler. Have the ability for the end user to choose. E.g. the user can choose https://www.serfdom.io/ https://www.consul.io/ in case does not want use ZK. Also https://github.com/kuujo/copycat. HA Without ZooKeeper Key: FLINK-2366 URL: https://issues.apache.org/jira/browse/FLINK-2366 Project: Flink Issue Type: Improvement Reporter: Suminda Dharmasena Priority: Minor Please provide a way to do HA without having to use ZK. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [Flink-2030][ml]Data Set Statistics and Histog...
Github user chiwanpark commented on the pull request: https://github.com/apache/flink/pull/861#issuecomment-132116059 I think that we can merge this PR after we decide the return type of `createHistogram` method. Any other points seem okay. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Created] (FLINK-2539) More unified code style for Scala code
Chiwan Park created FLINK-2539: -- Summary: More unified code style for Scala code Key: FLINK-2539 URL: https://issues.apache.org/jira/browse/FLINK-2539 Project: Flink Issue Type: Improvement Reporter: Chiwan Park Priority: Minor We need more specific code style guide for Scala to prevent code style differentiation. We discussed about this in [mailing list|http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Code-style-guideline-for-Scala-td7526.html]. Following works are needed: * Providing code formatting configuration for Eclipse and IntelliJ IDEA * More detail description in [wiki|https://cwiki.apache.org/confluence/display/FLINK/Coding+Guidelines+for+Scala] and [Coding Guidelines|http://flink.apache.org/coding-guidelines.html] in homepage * More strict rules in scala style checker (We need to discuss more) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1962] Add Gelly Scala API v2
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/1004#issuecomment-132119301 All right, let's merge this! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1962) Add Gelly Scala API
[ https://issues.apache.org/jira/browse/FLINK-1962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700927#comment-14700927 ] ASF GitHub Bot commented on FLINK-1962: --- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/1004#issuecomment-132119301 All right, let's merge this! Add Gelly Scala API --- Key: FLINK-1962 URL: https://issues.apache.org/jira/browse/FLINK-1962 Project: Flink Issue Type: Task Components: Gelly, Scala API Affects Versions: 0.9 Reporter: Vasia Kalavri Assignee: PJ Van Aeken -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2521) Add automatic test name logging for tests
[ https://issues.apache.org/jira/browse/FLINK-2521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700834#comment-14700834 ] ASF GitHub Bot commented on FLINK-2521: --- Github user tillrohrmann commented on the pull request: https://github.com/apache/flink/pull/1015#issuecomment-132097606 Good idea @StephanEwen, I'll make the log statements more prominent and then I'll merge the PR. Add automatic test name logging for tests - Key: FLINK-2521 URL: https://issues.apache.org/jira/browse/FLINK-2521 Project: Flink Issue Type: Improvement Reporter: Till Rohrmann Assignee: Till Rohrmann Priority: Minor When running tests on travis the Flink components log to a file. This is helpful in case of a failed test to retrieve the error. However, the log does not contain the test name and the reason for the failure. Therefore it is difficult to find the log output which corresponds to the failed test. It would be nice to automatically add the test case information to the log. This would ease the debugging process big time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2521] [tests] Adds automatic test name ...
Github user tillrohrmann commented on the pull request: https://github.com/apache/flink/pull/1015#issuecomment-132097606 Good idea @StephanEwen, I'll make the log statements more prominent and then I'll merge the PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1962) Add Gelly Scala API
[ https://issues.apache.org/jira/browse/FLINK-1962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700901#comment-14700901 ] ASF GitHub Bot commented on FLINK-1962: --- Github user chiwanpark commented on a diff in the pull request: https://github.com/apache/flink/pull/1004#discussion_r37273914 --- Diff: flink-staging/flink-gelly-scala/src/main/scala/org/apache/flink/graph/scala/Graph.scala --- @@ -0,0 +1,735 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.graph.scala + +import org.apache.flink.api.common.functions.{FilterFunction, MapFunction} +import org.apache.flink.api.common.typeinfo.TypeInformation +import org.apache.flink.api.java.{tuple = jtuple} +import org.apache.flink.api.scala._ +import org.apache.flink.graph._ +import org.apache.flink.graph.gsa.{ApplyFunction, GSAConfiguration, GatherFunction, SumFunction} +import org.apache.flink.graph.spargel.{MessagingFunction, VertexCentricConfiguration, VertexUpdateFunction} +import org.apache.flink.{graph = jg} + +import _root_.scala.collection.JavaConverters._ +import _root_.scala.reflect.ClassTag + +object Graph { + def fromDataSet[K: TypeInformation : ClassTag, VV: TypeInformation : ClassTag, EV: + TypeInformation : ClassTag](vertices: DataSet[Vertex[K, VV]], edges: DataSet[Edge[K, EV]], + env: ExecutionEnvironment): Graph[K, VV, EV] = { +wrapGraph(jg.Graph.fromDataSet[K, VV, EV](vertices.javaSet, edges.javaSet, env.getJavaEnv)) + } + + def fromCollection[K: TypeInformation : ClassTag, VV: TypeInformation : ClassTag, EV: + TypeInformation : ClassTag](vertices: Seq[Vertex[K, VV]], edges: Seq[Edge[K, EV]], env: + ExecutionEnvironment): Graph[K, VV, EV] = { +wrapGraph(jg.Graph.fromCollection[K, VV, EV](vertices.asJavaCollection, edges + .asJavaCollection, env.getJavaEnv)) + } +} + +/** + * Represents a graph consisting of {@link Edge edges} and {@link Vertex vertices}. + * @param jgraph the underlying java api Graph. + * @tparam K the key type for vertex and edge identifiers + * @tparam VV the value type for vertices + * @tparam EV the value type for edges + * @see org.apache.flink.graph.Edge + * @see org.apache.flink.graph.Vertex + */ +final class Graph[K: TypeInformation : ClassTag, VV: TypeInformation : ClassTag, EV: +TypeInformation : ClassTag](jgraph: jg.Graph[K, VV, EV]) { + + private[flink] def getWrappedGraph = jgraph + + + private[flink] def clean[F : AnyRef](f: F, checkSerializable: Boolean = true): F = { +if (jgraph.getContext.getConfig.isClosureCleanerEnabled) { + ClosureCleaner.clean(f, checkSerializable) +} +ClosureCleaner.ensureSerializable(f) +f + } + + /** + * @return the vertex DataSet. + */ + def getVertices = wrap(jgraph.getVertices) + + /** + * @return the edge DataSet. + */ + def getEdges = wrap(jgraph.getEdges) + + /** + * @return the vertex DataSet as Tuple2. + */ + def getVerticesAsTuple2(): DataSet[(K, VV)] = { +wrap(jgraph.getVerticesAsTuple2).map(jtuple = (jtuple.f0, jtuple.f1)) + } + + /** + * @return the edge DataSet as Tuple3. + */ + def getEdgesAsTuple3(): DataSet[(K, K, EV)] = { +wrap(jgraph.getEdgesAsTuple3).map(jtuple = (jtuple.f0, jtuple.f1, jtuple.f2)) + } + + /** + * Apply a function to the attribute of each vertex in the graph. + * + * @param mapper the map function to apply. + * @return a new graph + */ + def mapVertices[NV: TypeInformation : ClassTag](mapper: MapFunction[Vertex[K, VV], NV]): + Graph[K, NV, EV] = { +new Graph[K, NV, EV](jgraph.mapVertices[NV]( + mapper, + createTypeInformation[Vertex[K, NV]] +)) + } + + /** + * Apply a
[GitHub] flink pull request: [FLINK-1962] Add Gelly Scala API v2
Github user chiwanpark commented on a diff in the pull request: https://github.com/apache/flink/pull/1004#discussion_r37273914 --- Diff: flink-staging/flink-gelly-scala/src/main/scala/org/apache/flink/graph/scala/Graph.scala --- @@ -0,0 +1,735 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.graph.scala + +import org.apache.flink.api.common.functions.{FilterFunction, MapFunction} +import org.apache.flink.api.common.typeinfo.TypeInformation +import org.apache.flink.api.java.{tuple = jtuple} +import org.apache.flink.api.scala._ +import org.apache.flink.graph._ +import org.apache.flink.graph.gsa.{ApplyFunction, GSAConfiguration, GatherFunction, SumFunction} +import org.apache.flink.graph.spargel.{MessagingFunction, VertexCentricConfiguration, VertexUpdateFunction} +import org.apache.flink.{graph = jg} + +import _root_.scala.collection.JavaConverters._ +import _root_.scala.reflect.ClassTag + +object Graph { + def fromDataSet[K: TypeInformation : ClassTag, VV: TypeInformation : ClassTag, EV: + TypeInformation : ClassTag](vertices: DataSet[Vertex[K, VV]], edges: DataSet[Edge[K, EV]], + env: ExecutionEnvironment): Graph[K, VV, EV] = { +wrapGraph(jg.Graph.fromDataSet[K, VV, EV](vertices.javaSet, edges.javaSet, env.getJavaEnv)) + } + + def fromCollection[K: TypeInformation : ClassTag, VV: TypeInformation : ClassTag, EV: + TypeInformation : ClassTag](vertices: Seq[Vertex[K, VV]], edges: Seq[Edge[K, EV]], env: + ExecutionEnvironment): Graph[K, VV, EV] = { +wrapGraph(jg.Graph.fromCollection[K, VV, EV](vertices.asJavaCollection, edges + .asJavaCollection, env.getJavaEnv)) + } +} + +/** + * Represents a graph consisting of {@link Edge edges} and {@link Vertex vertices}. + * @param jgraph the underlying java api Graph. + * @tparam K the key type for vertex and edge identifiers + * @tparam VV the value type for vertices + * @tparam EV the value type for edges + * @see org.apache.flink.graph.Edge + * @see org.apache.flink.graph.Vertex + */ +final class Graph[K: TypeInformation : ClassTag, VV: TypeInformation : ClassTag, EV: +TypeInformation : ClassTag](jgraph: jg.Graph[K, VV, EV]) { + + private[flink] def getWrappedGraph = jgraph + + + private[flink] def clean[F : AnyRef](f: F, checkSerializable: Boolean = true): F = { +if (jgraph.getContext.getConfig.isClosureCleanerEnabled) { + ClosureCleaner.clean(f, checkSerializable) +} +ClosureCleaner.ensureSerializable(f) +f + } + + /** + * @return the vertex DataSet. + */ + def getVertices = wrap(jgraph.getVertices) + + /** + * @return the edge DataSet. + */ + def getEdges = wrap(jgraph.getEdges) + + /** + * @return the vertex DataSet as Tuple2. + */ + def getVerticesAsTuple2(): DataSet[(K, VV)] = { +wrap(jgraph.getVerticesAsTuple2).map(jtuple = (jtuple.f0, jtuple.f1)) + } + + /** + * @return the edge DataSet as Tuple3. + */ + def getEdgesAsTuple3(): DataSet[(K, K, EV)] = { +wrap(jgraph.getEdgesAsTuple3).map(jtuple = (jtuple.f0, jtuple.f1, jtuple.f2)) + } + + /** + * Apply a function to the attribute of each vertex in the graph. + * + * @param mapper the map function to apply. + * @return a new graph + */ + def mapVertices[NV: TypeInformation : ClassTag](mapper: MapFunction[Vertex[K, VV], NV]): + Graph[K, NV, EV] = { +new Graph[K, NV, EV](jgraph.mapVertices[NV]( + mapper, + createTypeInformation[Vertex[K, NV]] +)) + } + + /** + * Apply a function to the attribute of each vertex in the graph. + * + * @param fun the map function to apply. + * @return a new graph + */ + def mapVertices[NV: TypeInformation : ClassTag](fun: Vertex[K, VV] = NV): Graph[K, NV, EV] = {
[jira] [Commented] (FLINK-1962) Add Gelly Scala API
[ https://issues.apache.org/jira/browse/FLINK-1962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700920#comment-14700920 ] ASF GitHub Bot commented on FLINK-1962: --- Github user PieterJanVanAeken commented on the pull request: https://github.com/apache/flink/pull/1004#issuecomment-132118620 I fixed the unneccessary comment and did a quick rebase to keep up with master branch. Add Gelly Scala API --- Key: FLINK-1962 URL: https://issues.apache.org/jira/browse/FLINK-1962 Project: Flink Issue Type: Task Components: Gelly, Scala API Affects Versions: 0.9 Reporter: Vasia Kalavri Assignee: PJ Van Aeken -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2477][Add]Add tests for SocketClientSin...
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/977#issuecomment-132118720 Looks good and CI passes. Will merge this! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2477) Add test for SocketClientSink
[ https://issues.apache.org/jira/browse/FLINK-2477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700922#comment-14700922 ] ASF GitHub Bot commented on FLINK-2477: --- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/977#issuecomment-132118720 Looks good and CI passes. Will merge this! Add test for SocketClientSink - Key: FLINK-2477 URL: https://issues.apache.org/jira/browse/FLINK-2477 Project: Flink Issue Type: Test Components: Streaming Affects Versions: 0.10 Environment: win7 32bit;linux Reporter: Huang Wei Priority: Minor Fix For: 0.10 Original Estimate: 168h Remaining Estimate: 168h Add some tests for SocketClientSink. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-2540) LocalBufferPool.requestBuffer gets into infinite loop
[ https://issues.apache.org/jira/browse/FLINK-2540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Gevay updated FLINK-2540: --- Fix Version/s: 0.9.1 0.10 LocalBufferPool.requestBuffer gets into infinite loop - Key: FLINK-2540 URL: https://issues.apache.org/jira/browse/FLINK-2540 Project: Flink Issue Type: Bug Components: Core Reporter: Gabor Gevay Fix For: 0.10, 0.9.1 I'm trying to run a complicated computation that looks like this: [1]. One of the DataSource-Filter-Map chains finishes fine, but the other one freezes. Debugging shows that it is spinning in the while loop in LocalBufferPool.requestBuffer. askToRecycle is false. Both numberOfRequestedMemorySegments and currentPoolSize is 128, so it never goes into that if either. This is a stack trace: [2] And here is the code, if you would like to run it: [3]. Unfortunately, I can't make it more minimal, becuase if I remove some operators, the problem disappears. The class to start is malom.Solver. (On first run, it calculates some lookuptables for a few minutes, and puts them into /tmp/movegen) [1] http://compalg.inf.elte.hu/~ggevay/flink/plan.txt [2] http://compalg.inf.elte.hu/~ggevay/flink/stacktrace.txt [3] https://github.com/ggevay/flink/tree/deadlock-malom -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-2542) It should be documented that it is required from a join key to override hashCode(), when it is not a POJO
Gabor Gevay created FLINK-2542: -- Summary: It should be documented that it is required from a join key to override hashCode(), when it is not a POJO Key: FLINK-2542 URL: https://issues.apache.org/jira/browse/FLINK-2542 Project: Flink Issue Type: Bug Components: Gelly, Java API Reporter: Gabor Gevay Priority: Minor Fix For: 0.10, 0.9.1 If the join key is not a POJO, and does not override hashCode, then the join silently fails (produces empty output). I don't see this documented anywhere. The Gelly documentation should also have this info separately, because it does joins internally on the vertex IDs, but the user might not know this, or might not look at the join documentation when using Gelly. Here is an example code: {noformat} public static class ID implements ComparableID { public long foo; //no default ctor -- not a POJO public ID(long foo) { this.foo = foo; } @Override public int compareTo(ID o) { return ((Long)foo).compareTo(o.foo); } @Override public boolean equals(Object o0) { if(o0 instanceof ID) { ID o = (ID)o0; return foo == o.foo; } else { return false; } } @Override public int hashCode() { return 42; } } public static void main(String[] args) throws Exception { ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); DataSetTuple2ID, Long inDegrees = env.fromElements(Tuple2.of(new ID(123l), 4l)); DataSetTuple2ID, Long outDegrees = env.fromElements(Tuple2.of(new ID(123l), 5l)); DataSetTuple3ID, Long, Long degrees = inDegrees.join(outDegrees, JoinOperatorBase.JoinHint.REPARTITION_HASH_FIRST).where(0).equalTo(0) .with(new FlatJoinFunctionTuple2ID, Long, Tuple2ID, Long, Tuple3ID, Long, Long() { @Override public void join(Tuple2ID, Long first, Tuple2ID, Long second, CollectorTuple3ID, Long, Long out) { out.collect(new Tuple3ID, Long, Long(first.f0, first.f1, second.f1)); } }).withForwardedFieldsFirst(f0;f1).withForwardedFieldsSecond(f1); System.out.println(degrees count: + degrees.count()); } {noformat} This prints 1, but if I comment out the hashCode, it prints 0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2540) LocalBufferPool.requestBuffer gets into infinite loop
[ https://issues.apache.org/jira/browse/FLINK-2540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14701179#comment-14701179 ] Stephan Ewen commented on FLINK-2540: - I put Ufuk as responsible for that (hop that's okay). He knows that code best... LocalBufferPool.requestBuffer gets into infinite loop - Key: FLINK-2540 URL: https://issues.apache.org/jira/browse/FLINK-2540 Project: Flink Issue Type: Bug Components: Core Reporter: Gabor Gevay Assignee: Ufuk Celebi Priority: Blocker Fix For: 0.10, 0.9.1 I'm trying to run a complicated computation that looks like this: [1]. One of the DataSource-Filter-Map chains finishes fine, but the other one freezes. Debugging shows that it is spinning in the while loop in LocalBufferPool.requestBuffer. askToRecycle is false. Both numberOfRequestedMemorySegments and currentPoolSize is 128, so it never goes into that if either. This is a stack trace: [2] And here is the code, if you would like to run it: [3]. Unfortunately, I can't make it more minimal, becuase if I remove some operators, the problem disappears. The class to start is malom.Solver. (On first run, it calculates some lookuptables for a few minutes, and puts them into /tmp/movegen) [1] http://compalg.inf.elte.hu/~ggevay/flink/plan.txt [2] http://compalg.inf.elte.hu/~ggevay/flink/stacktrace.txt [3] https://github.com/ggevay/flink/tree/deadlock-malom -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-2540) LocalBufferPool.requestBuffer gets into infinite loop
[ https://issues.apache.org/jira/browse/FLINK-2540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Ewen updated FLINK-2540: Priority: Blocker (was: Major) LocalBufferPool.requestBuffer gets into infinite loop - Key: FLINK-2540 URL: https://issues.apache.org/jira/browse/FLINK-2540 Project: Flink Issue Type: Bug Components: Core Reporter: Gabor Gevay Priority: Blocker Fix For: 0.10, 0.9.1 I'm trying to run a complicated computation that looks like this: [1]. One of the DataSource-Filter-Map chains finishes fine, but the other one freezes. Debugging shows that it is spinning in the while loop in LocalBufferPool.requestBuffer. askToRecycle is false. Both numberOfRequestedMemorySegments and currentPoolSize is 128, so it never goes into that if either. This is a stack trace: [2] And here is the code, if you would like to run it: [3]. Unfortunately, I can't make it more minimal, becuase if I remove some operators, the problem disappears. The class to start is malom.Solver. (On first run, it calculates some lookuptables for a few minutes, and puts them into /tmp/movegen) [1] http://compalg.inf.elte.hu/~ggevay/flink/plan.txt [2] http://compalg.inf.elte.hu/~ggevay/flink/stacktrace.txt [3] https://github.com/ggevay/flink/tree/deadlock-malom -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-2540) LocalBufferPool.requestBuffer gets into infinite loop
[ https://issues.apache.org/jira/browse/FLINK-2540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Ewen updated FLINK-2540: Assignee: Ufuk Celebi LocalBufferPool.requestBuffer gets into infinite loop - Key: FLINK-2540 URL: https://issues.apache.org/jira/browse/FLINK-2540 Project: Flink Issue Type: Bug Components: Core Reporter: Gabor Gevay Assignee: Ufuk Celebi Priority: Blocker Fix For: 0.10, 0.9.1 I'm trying to run a complicated computation that looks like this: [1]. One of the DataSource-Filter-Map chains finishes fine, but the other one freezes. Debugging shows that it is spinning in the while loop in LocalBufferPool.requestBuffer. askToRecycle is false. Both numberOfRequestedMemorySegments and currentPoolSize is 128, so it never goes into that if either. This is a stack trace: [2] And here is the code, if you would like to run it: [3]. Unfortunately, I can't make it more minimal, becuase if I remove some operators, the problem disappears. The class to start is malom.Solver. (On first run, it calculates some lookuptables for a few minutes, and puts them into /tmp/movegen) [1] http://compalg.inf.elte.hu/~ggevay/flink/plan.txt [2] http://compalg.inf.elte.hu/~ggevay/flink/stacktrace.txt [3] https://github.com/ggevay/flink/tree/deadlock-malom -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2508) Confusing sharing of StreamExecutionEnvironment
[ https://issues.apache.org/jira/browse/FLINK-2508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14701196#comment-14701196 ] Márton Balassi commented on FLINK-2508: --- Thanks for both resports, Aljoscha. I am looking into how the batch side works to get the same behavior for streaming. Confusing sharing of StreamExecutionEnvironment --- Key: FLINK-2508 URL: https://issues.apache.org/jira/browse/FLINK-2508 Project: Flink Issue Type: Bug Components: Streaming Affects Versions: 0.10 Reporter: Stephan Ewen Assignee: Márton Balassi Fix For: 0.10 In the {{StreamExecutionEnvironment}}, the environment is once created and then shared with a static variable to all successive calls to {{getExecutionEnvironment()}}. But it can be overridden by calls to {{createLocalEnvironment()}} and {{createRemoteEnvironment()}}. This seems a bit un-intuitive, and probably creates confusion when dispatching multiple streaming jobs from within the same JVM. Why is it even necessary to cache the current execution environment? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [Flink-2030][ml]Data Set Statistics and Histog...
Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/861#issuecomment-132187018 Olrite. I'll be back in a few. :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2488) Expose attemptNumber in RuntimeContext
[ https://issues.apache.org/jira/browse/FLINK-2488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14701218#comment-14701218 ] ASF GitHub Bot commented on FLINK-2488: --- Github user StephanEwen commented on a diff in the pull request: https://github.com/apache/flink/pull/1026#discussion_r37294536 --- Diff: flink-core/src/main/java/org/apache/flink/api/common/TaskRuntimeInfo.java --- @@ -0,0 +1,103 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.api.common; + +/** + * Encapsulation of runtime information for a Task, including Task Name, sub-task index, etc. + * + */ +public class TaskRuntimeInfo implements java.io.Serializable { + + /** Task Name */ + private final String taskName; + + /** Index of this task in the parallel task group. Goes from 0 to {@link #numParallelTasks} - 1 */ + private final int subTaskIndex; + + /** Number of parallel running instances of this task */ + private final int numParallelTasks; + + /** Attempt number of this task */ + private final int attemptNumber; + + /** Task Name along with information on index of task in group */ + private transient final String taskNameWithSubTaskIndex; --- End diff -- If this is transient, it is null after deserialization. Either make it non-transient, or re-compute it after deserialization. Expose attemptNumber in RuntimeContext -- Key: FLINK-2488 URL: https://issues.apache.org/jira/browse/FLINK-2488 Project: Flink Issue Type: Improvement Components: JobManager, TaskManager Affects Versions: 0.10 Reporter: Robert Metzger Assignee: Sachin Goel Priority: Minor It would be nice to expose the attemptNumber of a task in the {{RuntimeContext}}. This would allow user code to behave differently in restart scenarios. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2488][FLINK-2496] Expose Task Manager c...
Github user StephanEwen commented on a diff in the pull request: https://github.com/apache/flink/pull/1026#discussion_r37294536 --- Diff: flink-core/src/main/java/org/apache/flink/api/common/TaskRuntimeInfo.java --- @@ -0,0 +1,103 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.api.common; + +/** + * Encapsulation of runtime information for a Task, including Task Name, sub-task index, etc. + * + */ +public class TaskRuntimeInfo implements java.io.Serializable { + + /** Task Name */ + private final String taskName; + + /** Index of this task in the parallel task group. Goes from 0 to {@link #numParallelTasks} - 1 */ + private final int subTaskIndex; + + /** Number of parallel running instances of this task */ + private final int numParallelTasks; + + /** Attempt number of this task */ + private final int attemptNumber; + + /** Task Name along with information on index of task in group */ + private transient final String taskNameWithSubTaskIndex; --- End diff -- If this is transient, it is null after deserialization. Either make it non-transient, or re-compute it after deserialization. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [Flink-2030][ml]Data Set Statistics and Histog...
Github user chiwanpark commented on the pull request: https://github.com/apache/flink/pull/861#issuecomment-132183402 Is there the valid reason to have multiple names ('discrete histogram', 'categorical histogram')? I think that to avoid confusion, we need to unify the names. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Created] (FLINK-2543) State handling does not support deserializing classes through the UserCodeClassloader
Robert Metzger created FLINK-2543: - Summary: State handling does not support deserializing classes through the UserCodeClassloader Key: FLINK-2543 URL: https://issues.apache.org/jira/browse/FLINK-2543 Project: Flink Issue Type: Bug Components: Streaming Affects Versions: 0.9, 0.10 Reporter: Robert Metzger Priority: Blocker The current implementation of the state checkpointing does not support custom classes, because the UserCodeClassLoader is not used to deserialize the state. {code} Error: java.lang.RuntimeException: Failed to deserialize state handle and setup initial operator state. at org.apache.flink.runtime.taskmanager.Task.run(Task.java:544) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.ClassNotFoundException: com.ottogroup.bi.searchlab.searchsessionizer.OperatorState at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:348) at java.io.ObjectInputStream.resolveClass(ObjectInputStream.java:626) at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1613) at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1774) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371) at org.apache.flink.runtime.state.ByteStreamStateHandle.getState(ByteStreamStateHandle.java:63) at org.apache.flink.runtime.state.ByteStreamStateHandle.getState(ByteStreamStateHandle.java:33) at org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.restoreInitialState(AbstractUdfStreamOperator.java:83) at org.apache.flink.streaming.runtime.tasks.StreamTask.setInitialState(StreamTask.java:276) at org.apache.flink.runtime.state.StateUtils.setOperatorState(StateUtils.java:51) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:541) {code} The issue has been reported by a user: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Custom-Class-for-state-checkpointing-td2415.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (FLINK-2543) State handling does not support deserializing classes through the UserCodeClassloader
[ https://issues.apache.org/jira/browse/FLINK-2543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Metzger reassigned FLINK-2543: - Assignee: Robert Metzger State handling does not support deserializing classes through the UserCodeClassloader - Key: FLINK-2543 URL: https://issues.apache.org/jira/browse/FLINK-2543 Project: Flink Issue Type: Bug Components: Streaming Affects Versions: 0.9, 0.10 Reporter: Robert Metzger Assignee: Robert Metzger Priority: Blocker The current implementation of the state checkpointing does not support custom classes, because the UserCodeClassLoader is not used to deserialize the state. {code} Error: java.lang.RuntimeException: Failed to deserialize state handle and setup initial operator state. at org.apache.flink.runtime.taskmanager.Task.run(Task.java:544) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.ClassNotFoundException: com.ottogroup.bi.searchlab.searchsessionizer.OperatorState at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:348) at java.io.ObjectInputStream.resolveClass(ObjectInputStream.java:626) at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1613) at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1774) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371) at org.apache.flink.runtime.state.ByteStreamStateHandle.getState(ByteStreamStateHandle.java:63) at org.apache.flink.runtime.state.ByteStreamStateHandle.getState(ByteStreamStateHandle.java:33) at org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.restoreInitialState(AbstractUdfStreamOperator.java:83) at org.apache.flink.streaming.runtime.tasks.StreamTask.setInitialState(StreamTask.java:276) at org.apache.flink.runtime.state.StateUtils.setOperatorState(StateUtils.java:51) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:541) {code} The issue has been reported by a user: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Custom-Class-for-state-checkpointing-td2415.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2508) Confusing sharing of StreamExecutionEnvironment
[ https://issues.apache.org/jira/browse/FLINK-2508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14701175#comment-14701175 ] Aljoscha Krettek commented on FLINK-2508: - Another problem I discovered is that TestStreamEnvironment is polluting other tests. For example, in TranslationTest the test uses {{StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();}}. When I execute this standalone (or in an IDE) the call will return a LocalStreamEnvironment. When other tests are run before this they set some TestStreamEnvironment as context environment and then this call will pick up that TestStreamEnvironment. When this happens the execute call will also execute the leftover (lingering) operators from other tests. Confusing sharing of StreamExecutionEnvironment --- Key: FLINK-2508 URL: https://issues.apache.org/jira/browse/FLINK-2508 Project: Flink Issue Type: Bug Components: Streaming Affects Versions: 0.10 Reporter: Stephan Ewen Assignee: Márton Balassi Fix For: 0.10 In the {{StreamExecutionEnvironment}}, the environment is once created and then shared with a static variable to all successive calls to {{getExecutionEnvironment()}}. But it can be overridden by calls to {{createLocalEnvironment()}} and {{createRemoteEnvironment()}}. This seems a bit un-intuitive, and probably creates confusion when dispatching multiple streaming jobs from within the same JVM. Why is it even necessary to cache the current execution environment? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2521] [tests] Adds automatic test name ...
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/1015 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [Flink-2030][ml]Data Set Statistics and Histog...
Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/861#issuecomment-132188989 @chiwanpark , modified the names. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2488][FLINK-2496] Expose Task Manager c...
Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/1026#issuecomment-132198739 Rebased to the latest master. Can be merged now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2488) Expose attemptNumber in RuntimeContext
[ https://issues.apache.org/jira/browse/FLINK-2488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14701217#comment-14701217 ] ASF GitHub Bot commented on FLINK-2488: --- Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/1026#issuecomment-132198739 Rebased to the latest master. Can be merged now. Expose attemptNumber in RuntimeContext -- Key: FLINK-2488 URL: https://issues.apache.org/jira/browse/FLINK-2488 Project: Flink Issue Type: Improvement Components: JobManager, TaskManager Affects Versions: 0.10 Reporter: Robert Metzger Assignee: Sachin Goel Priority: Minor It would be nice to expose the attemptNumber of a task in the {{RuntimeContext}}. This would allow user code to behave differently in restart scenarios. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-2541) TypeComparator creation fails for T2T1byte[], byte[]
Chesnay Schepler created FLINK-2541: --- Summary: TypeComparator creation fails for T2T1byte[], byte[] Key: FLINK-2541 URL: https://issues.apache.org/jira/browse/FLINK-2541 Project: Flink Issue Type: Bug Reporter: Chesnay Schepler When running the following job as a JavaProgramTest: {code} ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); DataSetTuple2Tuple1byte[], byte[] data = env.fromElements( new Tuple2Tuple1byte[], byte[]( new Tuple1byte[](new byte[]{1, 2}), new byte[]{1, 2, 3}), new Tuple2Tuple1byte[], byte[]( new Tuple1byte[](new byte[]{1, 2}), new byte[]{1, 2, 3})); data.groupBy(f0.f0) .reduceGroup(new DummyReduceTuple2Tuple1byte[], byte[]()) .print(); {code} with DummyReduce defined as {code} public static class DummyReduceIN implements GroupReduceFunctionIN, IN { @Override public void reduce(IterableIN values, CollectorIN out) throws Exception { for (IN value : values) { out.collect(value); }}} {code} i encountered the following exception: Tuple comparator creation has a bug java.lang.IllegalArgumentException: Tuple comparator creation has a bug at org.apache.flink.api.java.typeutils.TupleTypeInfo.getNewComparator(TupleTypeInfo.java:131) at org.apache.flink.api.common.typeutils.CompositeType.createComparator(CompositeType.java:133) at org.apache.flink.api.common.typeutils.CompositeType.createComparator(CompositeType.java:122) at org.apache.flink.api.common.operators.base.GroupReduceOperatorBase.getTypeComparator(GroupReduceOperatorBase.java:155) at org.apache.flink.api.common.operators.base.GroupReduceOperatorBase.executeOnCollections(GroupReduceOperatorBase.java:184) at org.apache.flink.api.common.operators.CollectionExecutor.executeUnaryOperator(CollectionExecutor.java:236) at org.apache.flink.api.common.operators.CollectionExecutor.execute(CollectionExecutor.java:143) at org.apache.flink.api.common.operators.CollectionExecutor.executeUnaryOperator(CollectionExecutor.java:215) at org.apache.flink.api.common.operators.CollectionExecutor.execute(CollectionExecutor.java:143) at org.apache.flink.api.common.operators.CollectionExecutor.execute(CollectionExecutor.java:125) at org.apache.flink.api.common.operators.CollectionExecutor.executeDataSink(CollectionExecutor.java:176) at org.apache.flink.api.common.operators.CollectionExecutor.execute(CollectionExecutor.java:152) at org.apache.flink.api.common.operators.CollectionExecutor.execute(CollectionExecutor.java:125) at org.apache.flink.api.common.operators.CollectionExecutor.execute(CollectionExecutor.java:109) at org.apache.flink.api.java.CollectionEnvironment.execute(CollectionEnvironment.java:33) at org.apache.flink.test.util.CollectionTestEnvironment.execute(CollectionTestEnvironment.java:35) at org.apache.flink.test.util.CollectionTestEnvironment.execute(CollectionTestEnvironment.java:30) at org.apache.flink.api.java.DataSet.collect(DataSet.java:408) at org.apache.flink.api.java.DataSet.print(DataSet.java:1349) at org.apache.flink.languagebinding.api.java.python.AbstractPythonTest.testProgram(AbstractPythonTest.java:42) at org.apache.flink.test.util.JavaProgramTestBase.testJobCollectionExecution(JavaProgramTestBase.java:226) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) at
[jira] [Updated] (FLINK-2541) TypeComparator creation fails for T2T1byte[], byte[]
[ https://issues.apache.org/jira/browse/FLINK-2541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Metzger updated FLINK-2541: -- Component/s: Java API TypeComparator creation fails for T2T1byte[], byte[] Key: FLINK-2541 URL: https://issues.apache.org/jira/browse/FLINK-2541 Project: Flink Issue Type: Bug Components: Java API Reporter: Chesnay Schepler When running the following job as a JavaProgramTest: {code} ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); DataSetTuple2Tuple1byte[], byte[] data = env.fromElements( new Tuple2Tuple1byte[], byte[]( new Tuple1byte[](new byte[]{1, 2}), new byte[]{1, 2, 3}), new Tuple2Tuple1byte[], byte[]( new Tuple1byte[](new byte[]{1, 2}), new byte[]{1, 2, 3})); data.groupBy(f0.f0) .reduceGroup(new DummyReduceTuple2Tuple1byte[], byte[]()) .print(); {code} with DummyReduce defined as {code} public static class DummyReduceIN implements GroupReduceFunctionIN, IN { @Override public void reduce(IterableIN values, CollectorIN out) throws Exception { for (IN value : values) { out.collect(value); }}} {code} i encountered the following exception: Tuple comparator creation has a bug java.lang.IllegalArgumentException: Tuple comparator creation has a bug at org.apache.flink.api.java.typeutils.TupleTypeInfo.getNewComparator(TupleTypeInfo.java:131) at org.apache.flink.api.common.typeutils.CompositeType.createComparator(CompositeType.java:133) at org.apache.flink.api.common.typeutils.CompositeType.createComparator(CompositeType.java:122) at org.apache.flink.api.common.operators.base.GroupReduceOperatorBase.getTypeComparator(GroupReduceOperatorBase.java:155) at org.apache.flink.api.common.operators.base.GroupReduceOperatorBase.executeOnCollections(GroupReduceOperatorBase.java:184) at org.apache.flink.api.common.operators.CollectionExecutor.executeUnaryOperator(CollectionExecutor.java:236) at org.apache.flink.api.common.operators.CollectionExecutor.execute(CollectionExecutor.java:143) at org.apache.flink.api.common.operators.CollectionExecutor.executeUnaryOperator(CollectionExecutor.java:215) at org.apache.flink.api.common.operators.CollectionExecutor.execute(CollectionExecutor.java:143) at org.apache.flink.api.common.operators.CollectionExecutor.execute(CollectionExecutor.java:125) at org.apache.flink.api.common.operators.CollectionExecutor.executeDataSink(CollectionExecutor.java:176) at org.apache.flink.api.common.operators.CollectionExecutor.execute(CollectionExecutor.java:152) at org.apache.flink.api.common.operators.CollectionExecutor.execute(CollectionExecutor.java:125) at org.apache.flink.api.common.operators.CollectionExecutor.execute(CollectionExecutor.java:109) at org.apache.flink.api.java.CollectionEnvironment.execute(CollectionEnvironment.java:33) at org.apache.flink.test.util.CollectionTestEnvironment.execute(CollectionTestEnvironment.java:35) at org.apache.flink.test.util.CollectionTestEnvironment.execute(CollectionTestEnvironment.java:30) at org.apache.flink.api.java.DataSet.collect(DataSet.java:408) at org.apache.flink.api.java.DataSet.print(DataSet.java:1349) at org.apache.flink.languagebinding.api.java.python.AbstractPythonTest.testProgram(AbstractPythonTest.java:42) at org.apache.flink.test.util.JavaProgramTestBase.testJobCollectionExecution(JavaProgramTestBase.java:226) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
[jira] [Commented] (FLINK-2521) Add automatic test name logging for tests
[ https://issues.apache.org/jira/browse/FLINK-2521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1470#comment-1470 ] ASF GitHub Bot commented on FLINK-2521: --- Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/1015 Add automatic test name logging for tests - Key: FLINK-2521 URL: https://issues.apache.org/jira/browse/FLINK-2521 Project: Flink Issue Type: Improvement Reporter: Till Rohrmann Assignee: Till Rohrmann Priority: Minor When running tests on travis the Flink components log to a file. This is helpful in case of a failed test to retrieve the error. However, the log does not contain the test name and the reason for the failure. Therefore it is difficult to find the log output which corresponds to the failed test. It would be nice to automatically add the test case information to the log. This would ease the debugging process big time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-2521) Add automatic test name logging for tests
[ https://issues.apache.org/jira/browse/FLINK-2521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann closed FLINK-2521. Resolution: Fixed Added via 2f0412f163f4d37605188c8cc763111e0b51f0dc Add automatic test name logging for tests - Key: FLINK-2521 URL: https://issues.apache.org/jira/browse/FLINK-2521 Project: Flink Issue Type: Improvement Reporter: Till Rohrmann Assignee: Till Rohrmann Priority: Minor When running tests on travis the Flink components log to a file. This is helpful in case of a failed test to retrieve the error. However, the log does not contain the test name and the reason for the failure. Therefore it is difficult to find the log output which corresponds to the failed test. It would be nice to automatically add the test case information to the log. This would ease the debugging process big time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [Flink-2030][ml]Data Set Statistics and Histog...
Github user chiwanpark commented on the pull request: https://github.com/apache/flink/pull/861#issuecomment-132186504 I'm also inclined to use Discrete. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [Flink-2030][ml]Data Set Statistics and Histog...
Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/861#issuecomment-132193111 They're not coupled at all. But both are related to statistics over data sets. This is why I combined them both in one. If you're wondering, there is a JIRA for the column wise statistics as well. [FLINK-2379]. Unless it is absolutely necessary, I'd like to keep them both in one. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2529][runtime]remove some unused code
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/1022#issuecomment-132196596 I'll merge half of this pull request... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1994) Add different gain calculation schemes to SGD
[ https://issues.apache.org/jira/browse/FLINK-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14701200#comment-14701200 ] Trevor Grant commented on FLINK-1994: - I more or less have this working with a few new optimizers based on sklearn- specifically the constant, optimal, and inverse scaling. Reading this paper to find others, as well as including bulleted ones in issue. Add different gain calculation schemes to SGD - Key: FLINK-1994 URL: https://issues.apache.org/jira/browse/FLINK-1994 Project: Flink Issue Type: Improvement Components: Machine Learning Library Reporter: Till Rohrmann Assignee: Trevor Grant Priority: Minor Labels: ML, Starter The current SGD implementation uses as gain for the weight updates the formula {{stepsize/sqrt(iterationNumber)}}. It would be good to make the gain calculation configurable and to provide different strategies for that. For example: * stepsize/(1 + iterationNumber) * stepsize*(1 + regularization * stepsize * iterationNumber)^(-3/4) See also how to properly select the gains [1]. Resources: [1] http://arxiv.org/pdf/1107.2490.pdf -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2529) fix on some unused code for flink-runtime
[ https://issues.apache.org/jira/browse/FLINK-2529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14701198#comment-14701198 ] ASF GitHub Bot commented on FLINK-2529: --- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/1022#issuecomment-132196596 I'll merge half of this pull request... fix on some unused code for flink-runtime - Key: FLINK-2529 URL: https://issues.apache.org/jira/browse/FLINK-2529 Project: Flink Issue Type: Improvement Components: Local Runtime Affects Versions: 0.10 Reporter: Huang Wei Priority: Minor Fix For: 0.10 Original Estimate: 168h Remaining Estimate: 168h In file BlobServer.java, I found the Thread.currentThread() will never return null in my learned knowledge. So I think shutdownHook != null“ is not necessary in code 'if (shutdownHook != null shutdownHook != Thread.currentThread())'; And I am not complete sure. Maybe I am wrong. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2488][FLINK-2496] Expose Task Manager c...
Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/1026#issuecomment-132199820 Fixed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2488) Expose attemptNumber in RuntimeContext
[ https://issues.apache.org/jira/browse/FLINK-2488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14701227#comment-14701227 ] ASF GitHub Bot commented on FLINK-2488: --- Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/1026#issuecomment-132199820 Fixed. Expose attemptNumber in RuntimeContext -- Key: FLINK-2488 URL: https://issues.apache.org/jira/browse/FLINK-2488 Project: Flink Issue Type: Improvement Components: JobManager, TaskManager Affects Versions: 0.10 Reporter: Robert Metzger Assignee: Sachin Goel Priority: Minor It would be nice to expose the attemptNumber of a task in the {{RuntimeContext}}. This would allow user code to behave differently in restart scenarios. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1994) Add different gain calculation schemes to SGD
[ https://issues.apache.org/jira/browse/FLINK-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14701225#comment-14701225 ] Till Rohrmann commented on FLINK-1994: -- Sounds great [~rawkintrevo]. When it's ready, then you should open a PR against Flink's master branch on github. Add different gain calculation schemes to SGD - Key: FLINK-1994 URL: https://issues.apache.org/jira/browse/FLINK-1994 Project: Flink Issue Type: Improvement Components: Machine Learning Library Reporter: Till Rohrmann Assignee: Trevor Grant Priority: Minor Labels: ML, Starter The current SGD implementation uses as gain for the weight updates the formula {{stepsize/sqrt(iterationNumber)}}. It would be good to make the gain calculation configurable and to provide different strategies for that. For example: * stepsize/(1 + iterationNumber) * stepsize*(1 + regularization * stepsize * iterationNumber)^(-3/4) See also how to properly select the gains [1]. Resources: [1] http://arxiv.org/pdf/1107.2490.pdf -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-2545) NegativeArraySizeException while creating hash table bloom filters
Greg Hogan created FLINK-2545: - Summary: NegativeArraySizeException while creating hash table bloom filters Key: FLINK-2545 URL: https://issues.apache.org/jira/browse/FLINK-2545 Project: Flink Issue Type: Bug Components: Distributed Runtime Affects Versions: master Reporter: Greg Hogan The following exception occurred a second time when I immediately re-ran my application, though after recompiling and restarting Flink the subsequent execution ran without error. java.lang.Exception: The data preparation for task '...' , caused an error: null at org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:465) at org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:354) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:581) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.NegativeArraySizeException at org.apache.flink.runtime.operators.hash.MutableHashTable.buildBloomFilterForBucket(MutableHashTable.java:1160) at org.apache.flink.runtime.operators.hash.MutableHashTable.buildBloomFilterForBucketsInPartition(MutableHashTable.java:1143) at org.apache.flink.runtime.operators.hash.MutableHashTable.spillPartition(MutableHashTable.java:1117) at org.apache.flink.runtime.operators.hash.MutableHashTable.insertBucketEntry(MutableHashTable.java:946) at org.apache.flink.runtime.operators.hash.MutableHashTable.insertIntoTable(MutableHashTable.java:868) at org.apache.flink.runtime.operators.hash.MutableHashTable.buildInitialTable(MutableHashTable.java:692) at org.apache.flink.runtime.operators.hash.MutableHashTable.open(MutableHashTable.java:455) at org.apache.flink.runtime.operators.hash.ReusingBuildSecondHashMatchIterator.open(ReusingBuildSecondHashMatchIterator.java:93) at org.apache.flink.runtime.operators.JoinDriver.prepare(JoinDriver.java:195) at org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:459) ... 3 more -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2545) NegativeArraySizeException while creating hash table bloom filters
[ https://issues.apache.org/jira/browse/FLINK-2545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14701802#comment-14701802 ] Stephan Ewen commented on FLINK-2545: - Thanks for reporting this! As a quick fix, you can disable bloom filters by adding {{taskmanager.runtime.hashjoin-bloom-filters: false}} to the Flink config. See here for a reference: https://ci.apache.org/projects/flink/flink-docs-master/setup/config.html#runtime-algorithms Th bloom filters are a relatively new addition in 0.10. NegativeArraySizeException while creating hash table bloom filters -- Key: FLINK-2545 URL: https://issues.apache.org/jira/browse/FLINK-2545 Project: Flink Issue Type: Bug Components: Distributed Runtime Affects Versions: master Reporter: Greg Hogan The following exception occurred a second time when I immediately re-ran my application, though after recompiling and restarting Flink the subsequent execution ran without error. java.lang.Exception: The data preparation for task '...' , caused an error: null at org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:465) at org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:354) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:581) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.NegativeArraySizeException at org.apache.flink.runtime.operators.hash.MutableHashTable.buildBloomFilterForBucket(MutableHashTable.java:1160) at org.apache.flink.runtime.operators.hash.MutableHashTable.buildBloomFilterForBucketsInPartition(MutableHashTable.java:1143) at org.apache.flink.runtime.operators.hash.MutableHashTable.spillPartition(MutableHashTable.java:1117) at org.apache.flink.runtime.operators.hash.MutableHashTable.insertBucketEntry(MutableHashTable.java:946) at org.apache.flink.runtime.operators.hash.MutableHashTable.insertIntoTable(MutableHashTable.java:868) at org.apache.flink.runtime.operators.hash.MutableHashTable.buildInitialTable(MutableHashTable.java:692) at org.apache.flink.runtime.operators.hash.MutableHashTable.open(MutableHashTable.java:455) at org.apache.flink.runtime.operators.hash.ReusingBuildSecondHashMatchIterator.open(ReusingBuildSecondHashMatchIterator.java:93) at org.apache.flink.runtime.operators.JoinDriver.prepare(JoinDriver.java:195) at org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:459) ... 3 more -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2488) Expose attemptNumber in RuntimeContext
[ https://issues.apache.org/jira/browse/FLINK-2488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14701814#comment-14701814 ] ASF GitHub Bot commented on FLINK-2488: --- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/1026#issuecomment-132322001 Sorry to be picky again, but it would be good to harmonize the code style a bit - make it more consistent with the remaining code. What struck me in particular the omission of space before the curly braces at the start of methods and other code blocks. All other parts of the code have that. There are ongoing discussions about making the code style stricter (also with respect to white spaces), and this is much easier if new code follows this standard already. Otherwise, there'd going to be a lot of style adjustments necessary when introducing the check. Expose attemptNumber in RuntimeContext -- Key: FLINK-2488 URL: https://issues.apache.org/jira/browse/FLINK-2488 Project: Flink Issue Type: Improvement Components: JobManager, TaskManager Affects Versions: 0.10 Reporter: Robert Metzger Assignee: Sachin Goel Priority: Minor It would be nice to expose the attemptNumber of a task in the {{RuntimeContext}}. This would allow user code to behave differently in restart scenarios. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2488][FLINK-2496] Expose Task Manager c...
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/1026#issuecomment-132322001 Sorry to be picky again, but it would be good to harmonize the code style a bit - make it more consistent with the remaining code. What struck me in particular the omission of space before the curly braces at the start of methods and other code blocks. All other parts of the code have that. There are ongoing discussions about making the code style stricter (also with respect to white spaces), and this is much easier if new code follows this standard already. Otherwise, there'd going to be a lot of style adjustments necessary when introducing the check. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: Path refactor
GitHub user gallenvara opened a pull request: https://github.com/apache/flink/pull/1034 Path refactor You can merge this pull request into a Git repository by running: $ git pull https://github.com/gallenvara/flink path_refactor Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/1034.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1034 commit 526a50a4ba38ae84574c31a6e67b39ac14649d44 Author: gallenvara gallenv...@126.com Date: 2015-08-18T10:31:40Z [Flink-2077] Clean and refactor the Path class. commit 4fb1b03f89d1a3a2793e957ff0132ba70bffb3a6 Author: gallenvara gallenv...@126.com Date: 2015-08-19T01:41:55Z [Flink-2077] Clean and refactor the Path class. commit 380ed45473f51ff2835bf8fcaa00e45c7b0b Author: gallenvara gallenv...@126.com Date: 2015-08-19T02:46:46Z [Flink-2077] Clean and refactor the Path class. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2077] [core] Rework Path class and add ...
GitHub user gallenvara opened a pull request: https://github.com/apache/flink/pull/1035 [FLINK-2077] [core] Rework Path class and add extend support for Windows paths The class org.apache.flink.core.fs.Path handles paths for Flink's FileInputFormat and FileOutputFormat. Over time, this class has become quite hard to read and modify. This PR does some work on cleaning and refactoring. You can merge this pull request into a Git repository by running: $ git pull https://github.com/gallenvara/flink path_refactor Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/1035.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1035 commit 526a50a4ba38ae84574c31a6e67b39ac14649d44 Author: gallenvara gallenv...@126.com Date: 2015-08-18T10:31:40Z [Flink-2077] Clean and refactor the Path class. commit 4fb1b03f89d1a3a2793e957ff0132ba70bffb3a6 Author: gallenvara gallenv...@126.com Date: 2015-08-19T01:41:55Z [Flink-2077] Clean and refactor the Path class. commit 380ed45473f51ff2835bf8fcaa00e45c7b0b Author: gallenvara gallenv...@126.com Date: 2015-08-19T02:46:46Z [Flink-2077] Clean and refactor the Path class. commit b7c770df6b2b255a30b5af5589544bf1597f32f5 Author: gallenvara gallenv...@126.com Date: 2015-08-19T03:22:41Z [FLINK-2077] [core] Rework Path class and add extend support for Windows paths --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Issue Comment Deleted] (FLINK-2366) HA Without ZooKeeper
[ https://issues.apache.org/jira/browse/FLINK-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suminda Dharmasena updated FLINK-2366: -- Comment: was deleted (was: OK. As long as it is *easily pluggable* it is OK. Also another established project: https://github.com/belaban/JGroups) HA Without ZooKeeper Key: FLINK-2366 URL: https://issues.apache.org/jira/browse/FLINK-2366 Project: Flink Issue Type: Improvement Reporter: Suminda Dharmasena Priority: Minor Please provide a way to do HA without having to use ZK. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2366) HA Without ZooKeeper
[ https://issues.apache.org/jira/browse/FLINK-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14702379#comment-14702379 ] Suminda Dharmasena commented on FLINK-2366: --- OK. As long as it is *easily pluggable* it is OK. Also another established project: https://github.com/belaban/JGroups HA Without ZooKeeper Key: FLINK-2366 URL: https://issues.apache.org/jira/browse/FLINK-2366 Project: Flink Issue Type: Improvement Reporter: Suminda Dharmasena Priority: Minor Please provide a way to do HA without having to use ZK. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2366) HA Without ZooKeeper
[ https://issues.apache.org/jira/browse/FLINK-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14702380#comment-14702380 ] Suminda Dharmasena commented on FLINK-2366: --- OK. As long as it is *easily pluggable* it is OK. Also another established project: https://github.com/belaban/JGroups HA Without ZooKeeper Key: FLINK-2366 URL: https://issues.apache.org/jira/browse/FLINK-2366 Project: Flink Issue Type: Improvement Reporter: Suminda Dharmasena Priority: Minor Please provide a way to do HA without having to use ZK. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [Flink-2030][ml]Data Set Statistics and Histog...
Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/861#issuecomment-132439907 Including the collect step in the documentation seems weird, IMO. Since the return type is a `DataSet[Histogram]`, anyone familiar with flink will know to collect it first before operating on the element itself. Of course, one other option would be actually collect and return the histogram in the `create` functions, but we don't wanna do that in case it is used inside an iteration. Should I include the collect step in the documentation then? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1901) Create sample operator for Dataset
[ https://issues.apache.org/jira/browse/FLINK-1901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14702304#comment-14702304 ] ASF GitHub Bot commented on FLINK-1901: --- Github user ChengXiangLi commented on a diff in the pull request: https://github.com/apache/flink/pull/949#discussion_r37373497 --- Diff: flink-core/src/main/java/org/apache/flink/api/common/operators/util/PoissonSampler.java --- @@ -0,0 +1,109 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.flink.api.common.operators.util; + +import com.google.common.base.Preconditions; +import org.apache.commons.math3.distribution.PoissonDistribution; + +import java.util.Iterator; + +/** + * A sampler implementation based on Poisson Distribution. While sampling elements with fraction and replacement, + * the selected number of each element follows a given poisson distribution, so we could use poisson + * distribution to generate random variables for sample. + * + * @param T The type of sample. + * @see a href=https://en.wikipedia.org/wiki/Poisson_distribution;https://en.wikipedia.org/wiki/Poisson_distribution/a + */ +public class PoissonSamplerT extends RandomSamplerT { + + private PoissonDistribution poissonDistribution; + private final double fraction; + + /** +* Create a poisson sampler which would sample elements with replacement. +* +* @param fraction The expected count of each element. +* @param seed Random number generator seed for internal PoissonDistribution. +*/ + public PoissonSampler(double fraction, long seed) { + Preconditions.checkArgument(fraction = 0, fraction should be positive.); + this.fraction = fraction; + if (this.fraction 0) { + this.poissonDistribution = new PoissonDistribution(fraction); + this.poissonDistribution.reseedRandomGenerator(seed); + } + } + + /** +* Create a poisson sampler which would sample elements with replacement. +* +* @param fraction The expected count of each element. +*/ + public PoissonSampler(double fraction) { + Preconditions.checkArgument(fraction = 0, fraction should be non-negative.); + this.fraction = fraction; + if (this.fraction 0) { + this.poissonDistribution = new PoissonDistribution(fraction); + } + } + + /** +* Sample the input elements, for each input element, generate its count with poisson distribution random variables generation. +* +* @param input Elements to be sampled. +* @return The sampled result which is lazy computed upon input elements. +*/ + @Override + public IteratorT sample(final IteratorT input) { + if (fraction == 0) { + return EMPTY_ITERABLE; + } + + return new SampledIteratorT() { + T currentElement; + int currentCount = 0; + + @Override + public boolean hasNext() { + if (currentElement == null || currentCount == 0) { + while (input.hasNext()) { + currentElement = input.next(); + currentCount = poissonDistribution.sample(); + if (currentCount 0) { + return true; + } + } + return false; + } + return true; + } +
[GitHub] flink pull request: [FLINK-1901] [core] Create sample operator for...
Github user ChengXiangLi commented on a diff in the pull request: https://github.com/apache/flink/pull/949#discussion_r37373497 --- Diff: flink-core/src/main/java/org/apache/flink/api/common/operators/util/PoissonSampler.java --- @@ -0,0 +1,109 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.flink.api.common.operators.util; + +import com.google.common.base.Preconditions; +import org.apache.commons.math3.distribution.PoissonDistribution; + +import java.util.Iterator; + +/** + * A sampler implementation based on Poisson Distribution. While sampling elements with fraction and replacement, + * the selected number of each element follows a given poisson distribution, so we could use poisson + * distribution to generate random variables for sample. + * + * @param T The type of sample. + * @see a href=https://en.wikipedia.org/wiki/Poisson_distribution;https://en.wikipedia.org/wiki/Poisson_distribution/a + */ +public class PoissonSamplerT extends RandomSamplerT { + + private PoissonDistribution poissonDistribution; + private final double fraction; + + /** +* Create a poisson sampler which would sample elements with replacement. +* +* @param fraction The expected count of each element. +* @param seed Random number generator seed for internal PoissonDistribution. +*/ + public PoissonSampler(double fraction, long seed) { + Preconditions.checkArgument(fraction = 0, fraction should be positive.); + this.fraction = fraction; + if (this.fraction 0) { + this.poissonDistribution = new PoissonDistribution(fraction); + this.poissonDistribution.reseedRandomGenerator(seed); + } + } + + /** +* Create a poisson sampler which would sample elements with replacement. +* +* @param fraction The expected count of each element. +*/ + public PoissonSampler(double fraction) { + Preconditions.checkArgument(fraction = 0, fraction should be non-negative.); + this.fraction = fraction; + if (this.fraction 0) { + this.poissonDistribution = new PoissonDistribution(fraction); + } + } + + /** +* Sample the input elements, for each input element, generate its count with poisson distribution random variables generation. +* +* @param input Elements to be sampled. +* @return The sampled result which is lazy computed upon input elements. +*/ + @Override + public IteratorT sample(final IteratorT input) { + if (fraction == 0) { + return EMPTY_ITERABLE; + } + + return new SampledIteratorT() { + T currentElement; + int currentCount = 0; + + @Override + public boolean hasNext() { + if (currentElement == null || currentCount == 0) { + while (input.hasNext()) { + currentElement = input.next(); + currentCount = poissonDistribution.sample(); + if (currentCount 0) { + return true; + } + } + return false; + } + return true; + } + + @Override + public T next() { + T result = currentElement; + if (currentCount == 0) { + currentElement =
[GitHub] flink pull request: Add Scala version of GraphAlgorithm
Github user chiwanpark commented on a diff in the pull request: https://github.com/apache/flink/pull/1033#discussion_r37376484 --- Diff: flink-staging/flink-gelly-scala/src/main/scala/org/apache/flink/graph/scala/GraphAlgorithm.scala --- @@ -0,0 +1,30 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.graph.scala + +/** + * @tparam K key type + * @tparam VV vertex value type + * @tparam EV edge value type + */ +abstract class GraphAlgorithm[K, VV, EV] { + + def run(graph: Graph[K, VV, EV]): Graph[K,VV,EV] --- End diff -- Please add space after comma :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: Add Scala version of GraphAlgorithm
Github user chiwanpark commented on a diff in the pull request: https://github.com/apache/flink/pull/1033#discussion_r37376469 --- Diff: flink-staging/flink-gelly-scala/src/main/scala/org/apache/flink/graph/scala/Graph.scala --- @@ -649,10 +650,13 @@ TypeInformation : ClassTag](jgraph: jg.Graph[K, VV, EV]) { jtuple.f1)) } - def run(algorithm: GraphAlgorithm[K, VV, EV]) = { + def run(algorithm: JGraphAlgorithm[K, VV, EV]): Graph[K,VV,EV] = { --- End diff -- Please add space after comma :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: Add Scala version of GraphAlgorithm
Github user chiwanpark commented on a diff in the pull request: https://github.com/apache/flink/pull/1033#discussion_r37376477 --- Diff: flink-staging/flink-gelly-scala/src/main/scala/org/apache/flink/graph/scala/Graph.scala --- @@ -649,10 +650,13 @@ TypeInformation : ClassTag](jgraph: jg.Graph[K, VV, EV]) { jtuple.f1)) } - def run(algorithm: GraphAlgorithm[K, VV, EV]) = { + def run(algorithm: JGraphAlgorithm[K, VV, EV]): Graph[K,VV,EV] = { wrapGraph(jgraph.run(algorithm)) } + def run(algorithm: GraphAlgorithm[K,VV,EV]): Graph[K,VV,EV] = { --- End diff -- Please add space after comma. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [Flink-2030][ml]Data Set Statistics and Histog...
Github user chiwanpark commented on the pull request: https://github.com/apache/flink/pull/861#issuecomment-132430686 The only problem to merge this is invalid example in documentation. We need calling `collect()` method and `apply(0)` to execute statistics methods such as `quantile`, `sum`, ..., etc.. Other things seem okay. :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2077) Rework Path class and add extend support for Windows paths
[ https://issues.apache.org/jira/browse/FLINK-2077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14702375#comment-14702375 ] ASF GitHub Bot commented on FLINK-2077: --- GitHub user gallenvara opened a pull request: https://github.com/apache/flink/pull/1035 [FLINK-2077] [core] Rework Path class and add extend support for Windows paths The class org.apache.flink.core.fs.Path handles paths for Flink's FileInputFormat and FileOutputFormat. Over time, this class has become quite hard to read and modify. This PR does some work on cleaning and refactoring. You can merge this pull request into a Git repository by running: $ git pull https://github.com/gallenvara/flink path_refactor Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/1035.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1035 commit 526a50a4ba38ae84574c31a6e67b39ac14649d44 Author: gallenvara gallenv...@126.com Date: 2015-08-18T10:31:40Z [Flink-2077] Clean and refactor the Path class. commit 4fb1b03f89d1a3a2793e957ff0132ba70bffb3a6 Author: gallenvara gallenv...@126.com Date: 2015-08-19T01:41:55Z [Flink-2077] Clean and refactor the Path class. commit 380ed45473f51ff2835bf8fcaa00e45c7b0b Author: gallenvara gallenv...@126.com Date: 2015-08-19T02:46:46Z [Flink-2077] Clean and refactor the Path class. commit b7c770df6b2b255a30b5af5589544bf1597f32f5 Author: gallenvara gallenv...@126.com Date: 2015-08-19T03:22:41Z [FLINK-2077] [core] Rework Path class and add extend support for Windows paths Rework Path class and add extend support for Windows paths -- Key: FLINK-2077 URL: https://issues.apache.org/jira/browse/FLINK-2077 Project: Flink Issue Type: Improvement Components: Core Affects Versions: 0.9 Reporter: Fabian Hueske Assignee: GaoLun Priority: Minor Labels: starter The class {{org.apache.flink.core.fs.Path}} handles paths for Flink's {{FileInputFormat}} and {{FileOutputFormat}}. Over time, this class has become quite hard to read and modify. It would benefit from some cleaning and refactoring. Along with the refactoring, support for Windows paths like {{//host/dir1/dir2}} could be added. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2488) Expose attemptNumber in RuntimeContext
[ https://issues.apache.org/jira/browse/FLINK-2488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14702524#comment-14702524 ] ASF GitHub Bot commented on FLINK-2488: --- Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/1026#issuecomment-132453057 @StephanEwen , I have fixed the styling issues as much as I could find out. Thanks for your patience. :) Expose attemptNumber in RuntimeContext -- Key: FLINK-2488 URL: https://issues.apache.org/jira/browse/FLINK-2488 Project: Flink Issue Type: Improvement Components: JobManager, TaskManager Affects Versions: 0.10 Reporter: Robert Metzger Assignee: Sachin Goel Priority: Minor It would be nice to expose the attemptNumber of a task in the {{RuntimeContext}}. This would allow user code to behave differently in restart scenarios. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2488][FLINK-2496] Expose Task Manager c...
Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/1026#issuecomment-132453057 @StephanEwen , I have fixed the styling issues as much as I could find out. Thanks for your patience. :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [Flink-2030][ml]Data Set Statistics and Histog...
Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/861#issuecomment-132449023 Hi @chiwanpark , I have modified the documentation to include the collect step. I have also made a few modification to how the histogram is created by using a `MapPartition` function instead of `Map`. Also, for more consistency, changed the signature of `DataStats` creation to return `DataSet[DataStats]` instead of `DataStats`. @thvasilo , I have integrated the statistics part with this. I hope that is alright. Please have a look at `MLUtils.scala` to see the usage. If it's not, I'd be happy to remove it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2488][FLINK-2496] Expose Task Manager c...
Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/1026#issuecomment-132449408 Sure thing. I'll fix it. IntelliJ re-formats used to take care of such things, but I've stopped using it recently as it was messing up scala code. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2488) Expose attemptNumber in RuntimeContext
[ https://issues.apache.org/jira/browse/FLINK-2488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14702475#comment-14702475 ] ASF GitHub Bot commented on FLINK-2488: --- Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/1026#issuecomment-132449408 Sure thing. I'll fix it. IntelliJ re-formats used to take care of such things, but I've stopped using it recently as it was messing up scala code. Expose attemptNumber in RuntimeContext -- Key: FLINK-2488 URL: https://issues.apache.org/jira/browse/FLINK-2488 Project: Flink Issue Type: Improvement Components: JobManager, TaskManager Affects Versions: 0.10 Reporter: Robert Metzger Assignee: Sachin Goel Priority: Minor It would be nice to expose the attemptNumber of a task in the {{RuntimeContext}}. This would allow user code to behave differently in restart scenarios. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-2532) The function name and the variable name are same, it is confusing
[ https://issues.apache.org/jira/browse/FLINK-2532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Márton Balassi closed FLINK-2532. - Resolution: Fixed Fix Version/s: 0.10 Via a9d55d3 The function name and the variable name are same, it is confusing - Key: FLINK-2532 URL: https://issues.apache.org/jira/browse/FLINK-2532 Project: Flink Issue Type: Bug Reporter: zhangrucong Priority: Minor Fix For: 0.10 In class StreamWindow, in function split, there is a list variable also called split. The readable is poor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2477) Add test for SocketClientSink
[ https://issues.apache.org/jira/browse/FLINK-2477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14701357#comment-14701357 ] ASF GitHub Bot commented on FLINK-2477: --- Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/977 Add test for SocketClientSink - Key: FLINK-2477 URL: https://issues.apache.org/jira/browse/FLINK-2477 Project: Flink Issue Type: Test Components: Streaming Affects Versions: 0.10 Environment: win7 32bit;linux Reporter: Huang Wei Priority: Minor Fix For: 0.10 Original Estimate: 168h Remaining Estimate: 168h Add some tests for SocketClientSink. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1962) Add Gelly Scala API
[ https://issues.apache.org/jira/browse/FLINK-1962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14701355#comment-14701355 ] ASF GitHub Bot commented on FLINK-1962: --- Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/1004 Add Gelly Scala API --- Key: FLINK-1962 URL: https://issues.apache.org/jira/browse/FLINK-1962 Project: Flink Issue Type: Task Components: Gelly, Scala API Affects Versions: 0.9 Reporter: Vasia Kalavri Assignee: PJ Van Aeken -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2529][runtime]remove some unused code
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/1022 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2529) fix on some unused code for flink-runtime
[ https://issues.apache.org/jira/browse/FLINK-2529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14701356#comment-14701356 ] ASF GitHub Bot commented on FLINK-2529: --- Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/1022 fix on some unused code for flink-runtime - Key: FLINK-2529 URL: https://issues.apache.org/jira/browse/FLINK-2529 Project: Flink Issue Type: Improvement Components: Local Runtime Affects Versions: 0.10 Reporter: Huang Wei Priority: Minor Fix For: 0.10 Original Estimate: 168h Remaining Estimate: 168h In file BlobServer.java, I found the Thread.currentThread() will never return null in my learned knowledge. So I think shutdownHook != null“ is not necessary in code 'if (shutdownHook != null shutdownHook != Thread.currentThread())'; And I am not complete sure. Maybe I am wrong. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1962] Add Gelly Scala API v2
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/1004 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: Add Scala version of GraphAlgorithm
GitHub user PieterJanVanAeken opened a pull request: https://github.com/apache/flink/pull/1033 Add Scala version of GraphAlgorithm The Scala Gelly API already has a wrapper for running java-based GraphAlgorithms. It is needed for running pre-built algorithms such as the PageRankAlgorithm. This small addition maintains compatibility with these java-based GraphAlgorithms while giving the user an additional possibility of developing their own GraphAlgorithm that makes use of the syntactic sugar methods that the new Scala Gelly API offers. You can merge this pull request into a Git repository by running: $ git pull https://github.com/PieterJanVanAeken/flink scala-gelly-algorithm Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/1033.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1033 commit 0a5c46314ba2cf1e4bb8d7703b17c4cde1fbca51 Author: Pieter-Jan Van Aeken pieterjan.vanae...@euranova.eu Date: 2015-08-18T14:48:44Z Add Scala version of Graph Algorithm that allows for syntactic sugar inside the algorithm. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: Add Scala version of GraphAlgorithm
Github user PieterJanVanAeken commented on the pull request: https://github.com/apache/flink/pull/1033#issuecomment-132241973 Please wait a sec with this one. Something is off with my local changes vs my repo changes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1962] Add Gelly Scala API v2
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/1004#issuecomment-132241841 If the APIs are in sync there should be one documentation for both, right? Similar as with the other APIs... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1962) Add Gelly Scala API
[ https://issues.apache.org/jira/browse/FLINK-1962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14701396#comment-14701396 ] ASF GitHub Bot commented on FLINK-1962: --- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/1004#issuecomment-132241841 If the APIs are in sync there should be one documentation for both, right? Similar as with the other APIs... Add Gelly Scala API --- Key: FLINK-1962 URL: https://issues.apache.org/jira/browse/FLINK-1962 Project: Flink Issue Type: Task Components: Gelly, Scala API Affects Versions: 0.9 Reporter: Vasia Kalavri Assignee: PJ Van Aeken Fix For: 0.10 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2529][runtime]remove some unused code
Github user HuangWHWHW commented on the pull request: https://github.com/apache/flink/pull/1022#issuecomment-132205784 Ha... Sorry for my English, just misunderstood the execution. I will learn more about the Thread class. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2529) fix on some unused code for flink-runtime
[ https://issues.apache.org/jira/browse/FLINK-2529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14701246#comment-14701246 ] ASF GitHub Bot commented on FLINK-2529: --- Github user HuangWHWHW commented on the pull request: https://github.com/apache/flink/pull/1022#issuecomment-132205784 Ha... Sorry for my English, just misunderstood the execution. I will learn more about the Thread class. fix on some unused code for flink-runtime - Key: FLINK-2529 URL: https://issues.apache.org/jira/browse/FLINK-2529 Project: Flink Issue Type: Improvement Components: Local Runtime Affects Versions: 0.10 Reporter: Huang Wei Priority: Minor Fix For: 0.10 Original Estimate: 168h Remaining Estimate: 168h In file BlobServer.java, I found the Thread.currentThread() will never return null in my learned knowledge. So I think shutdownHook != null“ is not necessary in code 'if (shutdownHook != null shutdownHook != Thread.currentThread())'; And I am not complete sure. Maybe I am wrong. -- This message was sent by Atlassian JIRA (v6.3.4#6332)