[jira] [Commented] (FLINK-1295) Add option to Flink client to start a YARN session per job
[ https://issues.apache.org/jira/browse/FLINK-1295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14271472#comment-14271472 ] ASF GitHub Bot commented on FLINK-1295: --- GitHub user rmetzger opened a pull request: https://github.com/apache/flink/pull/292 [FLINK-1295][FLINK-883] Allow to deploy 'job only' YARN cluster. Add tests to YARN - users can now also deploy Flink on YARN for executing a single job. - the flink-yarn project has been moved out of the flink-addons module - the MiniYARNCluster is used for testing Flink on YARN - There is now a (undocumented) Java interface Flink's YARN client, allowing users to manually control the YARN session. - ALL ports used by Flink when running on YARN are automatically determined. In the past users reported problems with blocked ports (YARN is telling the client the RPC address of the application master) - The checks before deployment have been improved to give better error messages if the user is requesting too many resources for a YARN session The change has been tested on google compute cloud (click to deploy with google cloud storage, not hdfs) and amazon emr (hdfs). You can merge this pull request into a Git repository by running: $ git pull https://github.com/rmetzger/flink flink1295-flink883-rebased-after-akka Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/292.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #292 commit 273cafbf44b5796b96ff2396b0f6e68f79ce9185 Author: Robert Metzger rmetz...@apache.org Date: 2014-12-01T17:59:49Z [FLINK-1295][FLINK-883] Allow to deploy 'job only' YARN cluster. Add tests to YARN - users can now also deploy Flink on YARN for executing a single job. - The flink-yarn project has been moved out of the flink-addons module - the MiniYARNCluster is used for testing Flink on YARN - There is now a (undocumented) Java interface Flink's YARN client, allowing users to manually control the Yarn session. - ALL ports used by Flink when running on YARN are automatically determined. In the past users reported problems with blocked ports (YARN is telling the client the RPC address of the application master) - The checks before deployment have been improved to give better error messages if the user is requesting too many resources for a YARN session Add option to Flink client to start a YARN session per job -- Key: FLINK-1295 URL: https://issues.apache.org/jira/browse/FLINK-1295 Project: Flink Issue Type: New Feature Components: YARN Client Reporter: Robert Metzger Assignee: Robert Metzger Currently, Flink users can only launch Flink on YARN as a YARN session (meaning a long-running YARN application that can run multiple Flink jobs) Users have requested to extend the Flink Client to allocate YARN containers only for executing a single job. As part of this pull request, I would suggest to refactor the YARN Client to make it more modular and object oriented. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1229) Synchronize WebClient arguments with command line arguments
[ https://issues.apache.org/jira/browse/FLINK-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14271484#comment-14271484 ] ASF GitHub Bot commented on FLINK-1229: --- Github user StephanEwen commented on a diff in the pull request: https://github.com/apache/flink/pull/291#discussion_r22725429 --- Diff: flink-clients/src/main/java/org/apache/flink/client/web/JobSubmissionServlet.java --- @@ -146,10 +146,22 @@ protected void doGet(HttpServletRequest req, HttpServletResponse resp) throws Se } String assemblerClass = null; - if (params.size() = 2 params.get(0).equals(assembler)) { - assemblerClass = params.get(1); - params.remove(0); - params.remove(0); + int pos = 0; + int parallelism = -1; + while(pos params.size()) { + if (params.get(pos).equals(-c)) { + assemblerClass = params.get(pos + 1); + params.remove(pos); + params.remove(pos); + } + else if (params.get(pos).equals(-p)) { + parallelism = Integer.parseInt(params.get(pos + 1)); + params.remove(pos); + params.remove(pos); + } + else { + pos++; --- End diff -- If you change this to `break`, it will only accept these flags before the user program arguments (not mixed with them), which makes sense and reflects the way the command line handles it. Synchronize WebClient arguments with command line arguments --- Key: FLINK-1229 URL: https://issues.apache.org/jira/browse/FLINK-1229 Project: Flink Issue Type: Improvement Components: Webfrontend Reporter: Timo Walther Priority: Minor In the webclient, the -c option is not supported. The weblient takes the command in the form {code}assembler org.apache.flink.WordCountJob{code} Should be consistent with the command line scripts. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (FLINK-1377) Bring Team page up to date
[ https://issues.apache.org/jira/browse/FLINK-1377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kostas Tzoumas resolved FLINK-1377. --- Resolution: Fixed Bring Team page up to date -- Key: FLINK-1377 URL: https://issues.apache.org/jira/browse/FLINK-1377 Project: Flink Issue Type: Task Components: Project Website Reporter: Stephan Ewen Assignee: Kostas Tzoumas The team page - Misses Vasia and Timo - Still contains incubator mentors - lists people as PPMC members (rather than PMC members) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1295][FLINK-883] Allow to deploy 'job o...
GitHub user rmetzger opened a pull request: https://github.com/apache/flink/pull/292 [FLINK-1295][FLINK-883] Allow to deploy 'job only' YARN cluster. Add tests to YARN - users can now also deploy Flink on YARN for executing a single job. - the flink-yarn project has been moved out of the flink-addons module - the MiniYARNCluster is used for testing Flink on YARN - There is now a (undocumented) Java interface Flink's YARN client, allowing users to manually control the YARN session. - ALL ports used by Flink when running on YARN are automatically determined. In the past users reported problems with blocked ports (YARN is telling the client the RPC address of the application master) - The checks before deployment have been improved to give better error messages if the user is requesting too many resources for a YARN session The change has been tested on google compute cloud (click to deploy with google cloud storage, not hdfs) and amazon emr (hdfs). You can merge this pull request into a Git repository by running: $ git pull https://github.com/rmetzger/flink flink1295-flink883-rebased-after-akka Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/292.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #292 commit 273cafbf44b5796b96ff2396b0f6e68f79ce9185 Author: Robert Metzger rmetz...@apache.org Date: 2014-12-01T17:59:49Z [FLINK-1295][FLINK-883] Allow to deploy 'job only' YARN cluster. Add tests to YARN - users can now also deploy Flink on YARN for executing a single job. - The flink-yarn project has been moved out of the flink-addons module - the MiniYARNCluster is used for testing Flink on YARN - There is now a (undocumented) Java interface Flink's YARN client, allowing users to manually control the Yarn session. - ALL ports used by Flink when running on YARN are automatically determined. In the past users reported problems with blocked ports (YARN is telling the client the RPC address of the application master) - The checks before deployment have been improved to give better error messages if the user is requesting too many resources for a YARN session --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1360) Automatic type extraction for Void is missing
[ https://issues.apache.org/jira/browse/FLINK-1360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14271479#comment-14271479 ] Felix Neutatz commented on FLINK-1360: -- Have you already fixed the TypeInfoParser bug or is there another issue in Jira for that? Automatic type extraction for Void is missing - Key: FLINK-1360 URL: https://issues.apache.org/jira/browse/FLINK-1360 Project: Flink Issue Type: Bug Reporter: Felix Neutatz Priority: Minor DataSetTuple2Void,Long data = env.fromElements(new Tuple2Void,Long(null, 1L)); data.print(); throws the following exception: Exception in thread main org.apache.flink.api.common.functions.InvalidTypesException: Automatic type extraction is not possible on candidates with null values. Please specify the types directly. at org.apache.flink.api.java.typeutils.TypeExtractor.privateGetForObject(TypeExtractor.java:1174) at org.apache.flink.api.java.typeutils.TypeExtractor.getForObject(TypeExtractor.java:1152) at org.apache.flink.api.java.ExecutionEnvironment.fromElements(ExecutionEnvironment.java:540) at org.apache.flink.hadoopcompatibility.mapreduce.example.ParquetOutput.main(ParquetOutput.java:58) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at com.intellij.rt.execution.application.AppMain.main(AppMain.java:134) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-1360) Automatic type extraction for Void is missing
[ https://issues.apache.org/jira/browse/FLINK-1360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Felix Neutatz closed FLINK-1360. Resolution: Not a Problem Please use i.e. TypeExtractor.getForClass() in these cases Automatic type extraction for Void is missing - Key: FLINK-1360 URL: https://issues.apache.org/jira/browse/FLINK-1360 Project: Flink Issue Type: Bug Reporter: Felix Neutatz Priority: Minor DataSetTuple2Void,Long data = env.fromElements(new Tuple2Void,Long(null, 1L)); data.print(); throws the following exception: Exception in thread main org.apache.flink.api.common.functions.InvalidTypesException: Automatic type extraction is not possible on candidates with null values. Please specify the types directly. at org.apache.flink.api.java.typeutils.TypeExtractor.privateGetForObject(TypeExtractor.java:1174) at org.apache.flink.api.java.typeutils.TypeExtractor.getForObject(TypeExtractor.java:1152) at org.apache.flink.api.java.ExecutionEnvironment.fromElements(ExecutionEnvironment.java:540) at org.apache.flink.hadoopcompatibility.mapreduce.example.ParquetOutput.main(ParquetOutput.java:58) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at com.intellij.rt.execution.application.AppMain.main(AppMain.java:134) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-986) Add intermediate results to distributed runtime
[ https://issues.apache.org/jira/browse/FLINK-986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14271631#comment-14271631 ] ASF GitHub Bot commented on FLINK-986: -- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/254#issuecomment-69373922 Nice. +1 to merge this now! Add intermediate results to distributed runtime --- Key: FLINK-986 URL: https://issues.apache.org/jira/browse/FLINK-986 Project: Flink Issue Type: New Feature Components: Distributed Runtime Reporter: Ufuk Celebi Assignee: Ufuk Celebi Support for intermediate results in the runtime is currently blocking different efforts like fault tolerance or result collection at the client. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1320) Add an off-heap variant of the managed memory
[ https://issues.apache.org/jira/browse/FLINK-1320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14271519#comment-14271519 ] ASF GitHub Bot commented on FLINK-1320: --- Github user mxm commented on the pull request: https://github.com/apache/flink/pull/290#issuecomment-69361811 @rmetzger I added some documentation for the config parameter. Add an off-heap variant of the managed memory - Key: FLINK-1320 URL: https://issues.apache.org/jira/browse/FLINK-1320 Project: Flink Issue Type: Improvement Components: Local Runtime Reporter: Stephan Ewen Priority: Minor For (nearly) all memory that Flink accumulates (in the form of sort buffers, hash tables, caching), we use a special way of representing data serialized across a set of memory pages. The big work lies in the way the algorithms are implemented to operate on pages, rather than on objects. The core class for the memory is the {{MemorySegment}}, which has all methods to set and get primitives values efficiently. It is a somewhat simpler (and faster) variant of a HeapByteBuffer. As such, it should be straightforward to create a version where the memory segment is not backed by a heap byte[], but by memory allocated outside the JVM, in a similar way as the NIO DirectByteBuffers, or the Netty direct buffers do it. This may have multiple advantages: - We reduce the size of the JVM heap (garbage collected) and the number and size of long living alive objects. For large JVM sizes, this may improve performance quite a bit. Utilmately, we would in many cases reduce JVM size to 1/3 to 1/2 and keep the remaining memory outside the JVM. - We save copies when we move memory pages to disk (spilling) or through the network (shuffling / broadcasting / forward piping) The changes required to implement this are - Add a {{UnmanagedMemorySegment}} that only stores the memory adress as a long, and the segment size. It is initialized from a DirectByteBuffer. - Allow the MemoryManager to allocate these MemorySegments, instead of the current ones. - Make sure that the startup script pick up the mode and configure the heap size and the max direct memory properly. Since the MemorySegment is probably the most performance critical class in Flink, we must take care that we do this right. The following are critical considerations: - If we want both solutions (heap and off-heap) to exist side-by-side (configurable), we must make the base MemorySegment abstract and implement two versions (heap and off-heap). - To get the best performance, we need to make sure that only one class gets loaded (or at least ever used), to ensure optimal JIT de-virtualization and inlining. - We should carefully measure the performance of both variants. From previous micro benchmarks, I remember that individual byte accesses in DirectByteBuffers (off-heap) were slightly slower than on-heap, any larger accesses were equally good or slightly better. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1320] Add an off-heap variant of the ma...
Github user mxm commented on the pull request: https://github.com/apache/flink/pull/290#issuecomment-69361811 @rmetzger I added some documentation for the config parameter. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Resolved] (FLINK-1109) Add IntelliJ/Eclipe setup guide for developers
[ https://issues.apache.org/jira/browse/FLINK-1109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ufuk Celebi resolved FLINK-1109. Resolution: Fixed Fixed in 7f659f6 and 7e08fa1. Add IntelliJ/Eclipe setup guide for developers -- Key: FLINK-1109 URL: https://issues.apache.org/jira/browse/FLINK-1109 Project: Flink Issue Type: Improvement Components: Documentation Reporter: Ufuk Celebi Assignee: Ufuk Celebi Priority: Minor Labels: starter -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1378) could not find implicit value for evidence parameter of type TypeInformation
[ https://issues.apache.org/jira/browse/FLINK-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14271057#comment-14271057 ] Aljoscha Krettek commented on FLINK-1378: - I'm looking into it. could not find implicit value for evidence parameter of type TypeInformation Key: FLINK-1378 URL: https://issues.apache.org/jira/browse/FLINK-1378 Project: Flink Issue Type: Bug Components: Scala API Affects Versions: 0.7.0-incubating Reporter: John Sandiford This is an example of one of many cases that I cannot get to compile with the scala API. I have tried using T : TypeInformation and : ClassTag but still cannot get it to work. //libraryDependencies += org.apache.flink % flink-scala % 0.7.0-incubating // //libraryDependencies += org.apache.flink % flink-clients % 0.7.0-incubating import org.apache.flink.api.scala._ import scala.util.{Success, Try} object Main extends App { val env = ExecutionEnvironment.getExecutionEnvironment val data: DataSet[Double] = env.fromElements(1.0, 2.0, 3.0, 4.0) def f[T](data: DataSet[T]): DataSet[(T, Try[Seq[Double]])] = { data.mapPartition((iterator: Iterator[T]) = { val first = iterator.next() val second = iterator.next() Iterator((first, Success(Seq(2.0, 3.0))), (second, Success(Seq(3.0, 1.0 }) } val g = f(data) g.print() env.execute(Flink Test) } -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1320) Add an off-heap variant of the managed memory
[ https://issues.apache.org/jira/browse/FLINK-1320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14271077#comment-14271077 ] ASF GitHub Bot commented on FLINK-1320: --- Github user mxm commented on the pull request: https://github.com/apache/flink/pull/290#issuecomment-69341391 @StephanEwen Thanks for pointing that out. This should probably go into a separate issue. Add an off-heap variant of the managed memory - Key: FLINK-1320 URL: https://issues.apache.org/jira/browse/FLINK-1320 Project: Flink Issue Type: Improvement Components: Local Runtime Reporter: Stephan Ewen Priority: Minor For (nearly) all memory that Flink accumulates (in the form of sort buffers, hash tables, caching), we use a special way of representing data serialized across a set of memory pages. The big work lies in the way the algorithms are implemented to operate on pages, rather than on objects. The core class for the memory is the {{MemorySegment}}, which has all methods to set and get primitives values efficiently. It is a somewhat simpler (and faster) variant of a HeapByteBuffer. As such, it should be straightforward to create a version where the memory segment is not backed by a heap byte[], but by memory allocated outside the JVM, in a similar way as the NIO DirectByteBuffers, or the Netty direct buffers do it. This may have multiple advantages: - We reduce the size of the JVM heap (garbage collected) and the number and size of long living alive objects. For large JVM sizes, this may improve performance quite a bit. Utilmately, we would in many cases reduce JVM size to 1/3 to 1/2 and keep the remaining memory outside the JVM. - We save copies when we move memory pages to disk (spilling) or through the network (shuffling / broadcasting / forward piping) The changes required to implement this are - Add a {{UnmanagedMemorySegment}} that only stores the memory adress as a long, and the segment size. It is initialized from a DirectByteBuffer. - Allow the MemoryManager to allocate these MemorySegments, instead of the current ones. - Make sure that the startup script pick up the mode and configure the heap size and the max direct memory properly. Since the MemorySegment is probably the most performance critical class in Flink, we must take care that we do this right. The following are critical considerations: - If we want both solutions (heap and off-heap) to exist side-by-side (configurable), we must make the base MemorySegment abstract and implement two versions (heap and off-heap). - To get the best performance, we need to make sure that only one class gets loaded (or at least ever used), to ensure optimal JIT de-virtualization and inlining. - We should carefully measure the performance of both variants. From previous micro benchmarks, I remember that individual byte accesses in DirectByteBuffers (off-heap) were slightly slower than on-heap, any larger accesses were equally good or slightly better. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (FLINK-1377) Bring Team page up to date
[ https://issues.apache.org/jira/browse/FLINK-1377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kostas Tzoumas reassigned FLINK-1377: - Assignee: Kostas Tzoumas Bring Team page up to date -- Key: FLINK-1377 URL: https://issues.apache.org/jira/browse/FLINK-1377 Project: Flink Issue Type: Task Components: Project Website Reporter: Stephan Ewen Assignee: Kostas Tzoumas The team page - Misses Vasia and Timo - Still contains incubator mentors - lists people as PPMC members (rather than PMC members) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-986) Add intermediate results to distributed runtime
[ https://issues.apache.org/jira/browse/FLINK-986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14271086#comment-14271086 ] ASF GitHub Bot commented on FLINK-986: -- Github user aalexandrov commented on the pull request: https://github.com/apache/flink/pull/254#issuecomment-69343812 I am excited! Add intermediate results to distributed runtime --- Key: FLINK-986 URL: https://issues.apache.org/jira/browse/FLINK-986 Project: Flink Issue Type: New Feature Components: Distributed Runtime Reporter: Ufuk Celebi Assignee: Ufuk Celebi Support for intermediate results in the runtime is currently blocking different efforts like fault tolerance or result collection at the client. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-986] [FLINK-25] [Distributed runtime] A...
Github user aalexandrov commented on the pull request: https://github.com/apache/flink/pull/254#issuecomment-69343812 I am excited! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---