Re: Improve the documentation of the Flink Architecture and internals
Why do you wan't to split stuff between the doc in the repository and the wiki. I for one would always be to lazy to check stuff in a wiki when there is also a documentation. Plus, this would lead to additional overhead in deciding what goes where and syncing between the two places for documentation. On Mon, Mar 16, 2015 at 7:59 PM, Stephan Ewen se...@apache.org wrote: Ah, I totally forgot to add to the internals: - Fault tolerance in Batch mode - Fault Tolerance in Streaming Mode, with state handling On Mon, Mar 16, 2015 at 7:51 PM, Stephan Ewen se...@apache.org wrote: Hi all! I would like to kick of an effort to improve the documentation of the Flink Architecture and internals. This also means making the streaming architecture more prominent in the docs. Being quite a sophisticated stack, we need to improve the presentation of how Flink works - to an extend necessary to use Flink (and to appreciate all the cool stuff that is happening). This should also come in handy with new contributors. As a general umbrella, we need to first decide where and how to organize the documentation. I would propose to put the bulk of the documentation into the Wiki. Create a dedicated section on Flink Internals and sub-pages for each component / topic. To the docs, we add a general overview from which we link into the Wiki. == These sections would go into the DOCS in the git repository == - Overview of Program, pre-flight phase (type extraction, optimizer), JobManager, TaskManager. Differences between streaming and batch. We can realize this through one very nice picture with few lines of text. - High level architecture stack, different program representations (API operators, common API DAG, optimizer DAG, parallel data flow (JobGraph / Execution Graph) - (maybe) Parallelism and scheduling. This seems to be paramount to understand for users. - Processes (JobManager, TaskManager, Webserver, WebClient, CLI client) == These sections would go into the WIKI == - Project structure (maven projects, what is where, dependencies between projects) - Component overview - JobManager (InstanceManager, Scheduler, BLOB server, Library Cache, Archiving) - TaskManager (MemoryManager, IOManager, BLOB Cache, Library Cache) - Involved Actor Systems / Actors / Messages - Details about submitting a job (library upload, job graph submission, execution graph setup, scheduling trigger) - Memory Management - Optimizer internals - Akka Setup specifics - Netty and pluggable data exchange strategies - Testing: Flink test clusters and unit test utilities - Developer How-To: Setting up Eclipse, IntelliJ, Travis - Step-by-step guide to add a new operator I will go ahead and stub some sections in the Wiki. As we discuss and agree/disagree with the outline, we can evolve the Wiki. Greetings, Stephan
Introduction of new Gelly developper
Hello, everyone, I am ZHOU Yi, a student working in graph algorithm, especially graph combinatorial algorithms. I am glad to join the Flink devlopping group. I am working on Gelly graph library, hopping to devote improvements and new features to Gelly graph library (algorithms library, graph partition improvements etc. issue26 https://github.com/project-flink/flink-graph/issues/20). Now, I plan to start with affinity propagation example Flink-1707 https://issues.apache.org/jira/browse/FLINK-1707. My apache account on appache is joey00! (a little bizzare ;-) ) , my github account is joey001. I have joined several lab group projects (AUTOSAR, TTCN-3), but don't much open source experience, especially large project like Flink. Your suggestions and comments will be welcomed. Best Regards. ZHOU Yi
Re: [DISCUSS] Issues with heterogeneity of the code
+1 for enforcing a more strict Java code style. However, let's not introduce a line legth of 100 like in Scala. I think that's hurting readability of the code. On Sat, Mar 14, 2015 at 4:41 PM, Ufuk Celebi u...@apache.org wrote: On Saturday, March 14, 2015, Aljoscha Krettek aljos...@apache.org wrote: I'm in favor of strict coding styles. And I like the google style. +1 I would like that. We essentially all agree that we want more homogeneity and I think strict rules are the only way to go. Since this is a very subjective matter it makes sense to go with something (somewhat) well established like the Google code style.
Re: [DISCUSS] Issues with heterogeneity of the code
+1 for not limiting the line length. 2015-03-16 14:39 GMT+01:00 Stephan Ewen se...@apache.org: +1 for not limiting the line length. Everyone should have a good sense to break lines. When in exceptional cases people violate this, it is usually for a good reason. On Mon, Mar 16, 2015 at 2:18 PM, Maximilian Michels m...@apache.org wrote: +1 for enforcing a more strict Java code style. However, let's not introduce a line legth of 100 like in Scala. I think that's hurting readability of the code. On Sat, Mar 14, 2015 at 4:41 PM, Ufuk Celebi u...@apache.org wrote: On Saturday, March 14, 2015, Aljoscha Krettek aljos...@apache.org wrote: I'm in favor of strict coding styles. And I like the google style. +1 I would like that. We essentially all agree that we want more homogeneity and I think strict rules are the only way to go. Since this is a very subjective matter it makes sense to go with something (somewhat) well established like the Google code style.
Re: [DISCUSS] Issues with heterogeneity of the code
+1 for the stricter Java code styles. We should not forget about providing code formatter settings for Eclipse and Intellij IDEA (as mentioned above). That would help a lot. (Of course if we'll use Google Code Style, they already provide such files https://code.google.com/p/google-styleguide/source/browse/trunk/intellij-java-google-style.xml .) On Mon, Mar 16, 2015 at 2:45 PM Alexander Alexandrov alexander.s.alexand...@gmail.com wrote: +1 for not limiting the line length. 2015-03-16 14:39 GMT+01:00 Stephan Ewen se...@apache.org: +1 for not limiting the line length. Everyone should have a good sense to break lines. When in exceptional cases people violate this, it is usually for a good reason. On Mon, Mar 16, 2015 at 2:18 PM, Maximilian Michels m...@apache.org wrote: +1 for enforcing a more strict Java code style. However, let's not introduce a line legth of 100 like in Scala. I think that's hurting readability of the code. On Sat, Mar 14, 2015 at 4:41 PM, Ufuk Celebi u...@apache.org wrote: On Saturday, March 14, 2015, Aljoscha Krettek aljos...@apache.org wrote: I'm in favor of strict coding styles. And I like the google style. +1 I would like that. We essentially all agree that we want more homogeneity and I think strict rules are the only way to go. Since this is a very subjective matter it makes sense to go with something (somewhat) well established like the Google code style.
Re: [DISCUSS] Name of Expression API and DataSet abstraction
+1 for Max's suggestion. On Mon, Mar 16, 2015 at 10:32 AM, Ufuk Celebi u...@apache.org wrote: On Fri, Mar 13, 2015 at 6:08 PM, Maximilian Michels m...@apache.org wrote: Thanks for starting the discussion. We should definitely not keep flink-expressions. I'm in favor of DataTable for the DataSet abstraction equivalent. For consistency, the package name should then be flink-table. At first sight, the name seems kind of plain but I think it is quite intuitive because the API enables you to work in a SQL like fashion. +1 I think this is a very good suggestion. :-) (There is an associated issue, we shouldn't forget to close: https://issues.apache.org/jira/browse/FLINK-1623)
[jira] [Created] (FLINK-1706) Add disk spilling to BarrierBuffer
Gyula Fora created FLINK-1706: - Summary: Add disk spilling to BarrierBuffer Key: FLINK-1706 URL: https://issues.apache.org/jira/browse/FLINK-1706 Project: Flink Issue Type: Improvement Components: Distributed Runtime, Streaming Reporter: Gyula Fora Assignee: Gyula Fora Fix For: 0.9 The barrier buffers, used for streaming fault tolerance to sync on superstep barriers, currently dont spill to disk which can cause out of memory exceptions when some input task is delayed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: [DISCUSS] Name of Expression API and DataSet abstraction
+1 for DataTable On Mon, Mar 16, 2015 at 4:11 PM Till Rohrmann trohrm...@apache.org wrote: +1 for DataTable On Mon, Mar 16, 2015 at 10:34 AM, Márton Balassi balassi.mar...@gmail.com wrote: +1 for Max's suggestion. On Mon, Mar 16, 2015 at 10:32 AM, Ufuk Celebi u...@apache.org wrote: On Fri, Mar 13, 2015 at 6:08 PM, Maximilian Michels m...@apache.org wrote: Thanks for starting the discussion. We should definitely not keep flink-expressions. I'm in favor of DataTable for the DataSet abstraction equivalent. For consistency, the package name should then be flink-table. At first sight, the name seems kind of plain but I think it is quite intuitive because the API enables you to work in a SQL like fashion. +1 I think this is a very good suggestion. :-) (There is an associated issue, we shouldn't forget to close: https://issues.apache.org/jira/browse/FLINK-1623)
[jira] [Created] (FLINK-1708) KafkaSink not working from Usercode classloader
Robert Metzger created FLINK-1708: - Summary: KafkaSink not working from Usercode classloader Key: FLINK-1708 URL: https://issues.apache.org/jira/browse/FLINK-1708 Project: Flink Issue Type: Bug Components: Streaming Affects Versions: 0.9 Reporter: Robert Metzger Assignee: Robert Metzger The KafakSink is not able to be used from a user jar. I have a fix ready. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: [DISCUSS] Issues with heterogeneity of the code
Do we already enforce the official Scala style guide strictly? On Mon, Mar 16, 2015 at 4:57 PM, Aljoscha Krettek aljos...@apache.org wrote: I'm already always sticking to the official Scala style guide, with the exception of 100 line length. On Mar 16, 2015 3:27 PM, Till Rohrmann trohrm...@apache.org wrote: +1 for stricter Java code styles. I haven't looked into the Google Code Style but maybe we make it easier for new contributors if we apply a coding style which is somehow known. +1 for line length of 100 for Scala code. I think it makes code review on GitHub easier. For the Scala style, we could stick to official style guidelines [1]. [1] http://docs.scala-lang.org/style/ On Mon, Mar 16, 2015 at 3:06 PM, Hermann Gábor reckone...@gmail.com wrote: +1 for the stricter Java code styles. We should not forget about providing code formatter settings for Eclipse and Intellij IDEA (as mentioned above). That would help a lot. (Of course if we'll use Google Code Style, they already provide such files https://code.google.com/p/google-styleguide/source/browse/trunk/intellij-java-google-style.xml .) On Mon, Mar 16, 2015 at 2:45 PM Alexander Alexandrov alexander.s.alexand...@gmail.com wrote: +1 for not limiting the line length. 2015-03-16 14:39 GMT+01:00 Stephan Ewen se...@apache.org: +1 for not limiting the line length. Everyone should have a good sense to break lines. When in exceptional cases people violate this, it is usually for a good reason. On Mon, Mar 16, 2015 at 2:18 PM, Maximilian Michels m...@apache.org wrote: +1 for enforcing a more strict Java code style. However, let's not introduce a line legth of 100 like in Scala. I think that's hurting readability of the code. On Sat, Mar 14, 2015 at 4:41 PM, Ufuk Celebi u...@apache.org wrote: On Saturday, March 14, 2015, Aljoscha Krettek aljos...@apache.org wrote: I'm in favor of strict coding styles. And I like the google style. +1 I would like that. We essentially all agree that we want more homogeneity and I think strict rules are the only way to go. Since this is a very subjective matter it makes sense to go with something (somewhat) well established like the Google code style.
[jira] [Created] (FLINK-1709) Add support for programs with higher-than-slot parallelism
Ufuk Celebi created FLINK-1709: -- Summary: Add support for programs with higher-than-slot parallelism Key: FLINK-1709 URL: https://issues.apache.org/jira/browse/FLINK-1709 Project: Flink Issue Type: Improvement Affects Versions: master Reporter: Ufuk Celebi Currently, we can't run programs with higher parallelism than available slots. For example, if we currently have a map-reduce program and 4 task slots configured (e.g. 2 task managers with 2 slots per task manager), the map and reduce tasks will be scheduled with pipelined results and the same parallelism in shared slots. Setting the parallelism to more than available slots will result in a NoResourcesAvailableException. As a first step to support these kinds of programs, we can add initial support for this when running in batch mode (after https://github.com/apache/flink/pull/471 is merged). This is easier than the original pipelined scenario, because the map tasks can be deployed after each other to produce the blocking result. The blocking result can then be consumed after all map tasks produced their result. The mechanism in #471 to deploy result receivers can be used for this and should not need any modifications. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Could not build up connection to JobManager
It is really strange. It's right that the CliFrontend now resolves localhost to the correct local address 10.218.100.122. Moreover, according to the logs, the JobManager is also started and binds to akka.tcp:// flink@10.218.100.122:6123. According to the logs, this is also the address the CliFrontend uses to connect to the JobManager. If the timestamps are correct, then the JobManager was still alive when the job was sent. I don't really understand why this happens. Can it be that the CliFrontend which binds to 127.0.0.1 cannot communicate with 10.218.100.122? Can it be that you have some settings which prevent this? For the failing 127.0.0.1 case, it would be helpful to have access to the JobManager log. I've updated the branch https://github.com/tillrohrmann/flink/tree/fixJobClient with a new fix for the localhost scenario. Could you try it out again? Thanks a lot for your help. Best regards, Till On Mon, Mar 16, 2015 at 10:30 AM, Ufuk Celebi u...@apache.org wrote: There was an issue for this: https://issues.apache.org/jira/browse/FLINK-1634 Can we close it then? On Sat, Mar 14, 2015 at 9:16 PM, Dulaj Viduranga vidura...@icloud.com wrote: Hay Stephan, Great to know you could fix the issue. Thank you on the update. Best regards. On Mar 14, 2015, at 9:19 PM, Stephan Ewen se...@apache.org wrote: Hey Dulaj! Forget what I said in the previous email. The issue with the wrong address binding seems to be solved now. There is another issue that the embedded taskmanager does not start properly, for whatever reason. My gut feeling is that there is something wrong There is a patch pending that changes the startup behavior to debug these situations much easier. I'll ping you as soon as that is in... Stephan On Sat, Mar 14, 2015 at 4:42 PM, Stephan Ewen se...@apache.org wrote: Hey Dulaj! One thing you can try is to add to the JVM startup options (in the scripts in the bin folder) the option -Djava.net.preferIPv4Stack=true and see if that helps it? Stephan On Sat, Mar 14, 2015 at 4:29 AM, Dulaj Viduranga vidura...@icloud.com wrote: Hi, Still this is no luck. I’ll upload the logs with configuration “localhost as well as “127.0.0.1” so you can take a look. 127.0.0.1 flink-Vidura-flink-client-localhost.log https://gist.github.com/viduranga/1d01149eee238158519e#file-flink-vidura-flink-client-localhost-log localhost flink-Vidura-flink-client-localhost.log https://gist.github.com/viduranga/d866c24c0ba566abab17#file-flink-vidura-flink-client-localhost-log flink-Vidura-jobmanager-localhost.log https://gist.github.com/viduranga/e7549ef818c6a2af73e9#file-flink-vidura-jobmanager-localhost-log On Mar 11, 2015, at 11:32 PM, Till Rohrmann trohrm...@apache.org wrote: Hi Dulaj, sorry for my late response. It looks as if the JobClient tries to connect to the JobManager using its IPv6 instead of IPv4. Akka is really picky when it comes to remote address. If Akka binds to the FQDN, then other ActorSystem which try to connect to it using its IP address won't be successful. I assume that this might be a problem. I tried to fix it. You can find it here [1]. Could you please try it out by starting a local cluster with the start-local.sh script. If it fails, could you please send me all log files (client, jobmanager and taskmanager). Once we figured out why the JobCilent does not connect, we can try to tackle the BlobServer issue. Cheers, Till [1] https://github.com/tillrohrmann/flink/tree/fixJobClient On Thu, Mar 5, 2015 at 4:40 PM, Dulaj Viduranga vidura...@icloud.com wrote: Hi, The error message is, 21:06:01,521 WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable org.apache.flink.client.program.ProgramInvocationException: Could not build up connection to JobManager. at org.apache.flink.client.program.Client.run(Client.java:327) at org.apache.flink.client.program.Client.run(Client.java:306) at org.apache.flink.client.program.Client.run(Client.java:300) at org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:55) at org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:82) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:483) at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:437)
Re: [DISCUSS] Name of Expression API and DataSet abstraction
I am also more in favor of Rel and Relation, but DataTable nicely follows the terms DataSet and DataStream. On Mar 16, 2015 4:58 PM, Aljoscha Krettek aljos...@apache.org wrote: I like Relation or Rel, is shorter. On Mar 16, 2015 4:52 PM, Hermann Gábor reckone...@gmail.com wrote: +1 for DataTable On Mon, Mar 16, 2015 at 4:11 PM Till Rohrmann trohrm...@apache.org wrote: +1 for DataTable On Mon, Mar 16, 2015 at 10:34 AM, Márton Balassi balassi.mar...@gmail.com wrote: +1 for Max's suggestion. On Mon, Mar 16, 2015 at 10:32 AM, Ufuk Celebi u...@apache.org wrote: On Fri, Mar 13, 2015 at 6:08 PM, Maximilian Michels m...@apache.org wrote: Thanks for starting the discussion. We should definitely not keep flink-expressions. I'm in favor of DataTable for the DataSet abstraction equivalent. For consistency, the package name should then be flink-table. At first sight, the name seems kind of plain but I think it is quite intuitive because the API enables you to work in a SQL like fashion. +1 I think this is a very good suggestion. :-) (There is an associated issue, we shouldn't forget to close: https://issues.apache.org/jira/browse/FLINK-1623)
Re: [DISCUSS] Issues with heterogeneity of the code
I'm already always sticking to the official Scala style guide, with the exception of 100 line length. On Mar 16, 2015 3:27 PM, Till Rohrmann trohrm...@apache.org wrote: +1 for stricter Java code styles. I haven't looked into the Google Code Style but maybe we make it easier for new contributors if we apply a coding style which is somehow known. +1 for line length of 100 for Scala code. I think it makes code review on GitHub easier. For the Scala style, we could stick to official style guidelines [1]. [1] http://docs.scala-lang.org/style/ On Mon, Mar 16, 2015 at 3:06 PM, Hermann Gábor reckone...@gmail.com wrote: +1 for the stricter Java code styles. We should not forget about providing code formatter settings for Eclipse and Intellij IDEA (as mentioned above). That would help a lot. (Of course if we'll use Google Code Style, they already provide such files https://code.google.com/p/google-styleguide/source/browse/trunk/intellij-java-google-style.xml .) On Mon, Mar 16, 2015 at 2:45 PM Alexander Alexandrov alexander.s.alexand...@gmail.com wrote: +1 for not limiting the line length. 2015-03-16 14:39 GMT+01:00 Stephan Ewen se...@apache.org: +1 for not limiting the line length. Everyone should have a good sense to break lines. When in exceptional cases people violate this, it is usually for a good reason. On Mon, Mar 16, 2015 at 2:18 PM, Maximilian Michels m...@apache.org wrote: +1 for enforcing a more strict Java code style. However, let's not introduce a line legth of 100 like in Scala. I think that's hurting readability of the code. On Sat, Mar 14, 2015 at 4:41 PM, Ufuk Celebi u...@apache.org wrote: On Saturday, March 14, 2015, Aljoscha Krettek aljos...@apache.org wrote: I'm in favor of strict coding styles. And I like the google style. +1 I would like that. We essentially all agree that we want more homogeneity and I think strict rules are the only way to go. Since this is a very subjective matter it makes sense to go with something (somewhat) well established like the Google code style.
Re: Could not build up connection to JobManager
Could you please upload the logs? They would be really helpful. On Mon, Mar 16, 2015 at 6:11 PM, Dulaj Viduranga vidura...@icloud.com wrote: Hi, I tested the update but it’s still the same. I think it isn’t a problem with my system because, I have an XAMPP server working totally fine (I tried with it is shut down as well) and also I doubly checked hosts files. I had little snitch installed but I also tried uninstalling it. Isn’t there a way around without using DNS to resolve localhost? On Mar 16, 2015, at 10:04 PM, Till Rohrmann trohrm...@apache.org wrote: It is really strange. It's right that the CliFrontend now resolves localhost to the correct local address 10.218.100.122. Moreover, according to the logs, the JobManager is also started and binds to akka.tcp:// flink@10.218.100.122:6123. According to the logs, this is also the address the CliFrontend uses to connect to the JobManager. If the timestamps are correct, then the JobManager was still alive when the job was sent. I don't really understand why this happens. Can it be that the CliFrontend which binds to 127.0.0.1 cannot communicate with 10.218.100.122? Can it be that you have some settings which prevent this? For the failing 127.0.0.1 case, it would be helpful to have access to the JobManager log. I've updated the branch https://github.com/tillrohrmann/flink/tree/fixJobClient with a new fix for the localhost scenario. Could you try it out again? Thanks a lot for your help. Best regards, Till On Mon, Mar 16, 2015 at 10:30 AM, Ufuk Celebi u...@apache.org wrote: There was an issue for this: https://issues.apache.org/jira/browse/FLINK-1634 Can we close it then? On Sat, Mar 14, 2015 at 9:16 PM, Dulaj Viduranga vidura...@icloud.com wrote: Hay Stephan, Great to know you could fix the issue. Thank you on the update. Best regards. On Mar 14, 2015, at 9:19 PM, Stephan Ewen se...@apache.org wrote: Hey Dulaj! Forget what I said in the previous email. The issue with the wrong address binding seems to be solved now. There is another issue that the embedded taskmanager does not start properly, for whatever reason. My gut feeling is that there is something wrong There is a patch pending that changes the startup behavior to debug these situations much easier. I'll ping you as soon as that is in... Stephan On Sat, Mar 14, 2015 at 4:42 PM, Stephan Ewen se...@apache.org wrote: Hey Dulaj! One thing you can try is to add to the JVM startup options (in the scripts in the bin folder) the option -Djava.net.preferIPv4Stack=true and see if that helps it? Stephan On Sat, Mar 14, 2015 at 4:29 AM, Dulaj Viduranga vidura...@icloud.com wrote: Hi, Still this is no luck. I’ll upload the logs with configuration “localhost as well as “127.0.0.1” so you can take a look. 127.0.0.1 flink-Vidura-flink-client-localhost.log https://gist.github.com/viduranga/1d01149eee238158519e#file-flink-vidura-flink-client-localhost-log localhost flink-Vidura-flink-client-localhost.log https://gist.github.com/viduranga/d866c24c0ba566abab17#file-flink-vidura-flink-client-localhost-log flink-Vidura-jobmanager-localhost.log https://gist.github.com/viduranga/e7549ef818c6a2af73e9#file-flink-vidura-jobmanager-localhost-log On Mar 11, 2015, at 11:32 PM, Till Rohrmann trohrm...@apache.org wrote: Hi Dulaj, sorry for my late response. It looks as if the JobClient tries to connect to the JobManager using its IPv6 instead of IPv4. Akka is really picky when it comes to remote address. If Akka binds to the FQDN, then other ActorSystem which try to connect to it using its IP address won't be successful. I assume that this might be a problem. I tried to fix it. You can find it here [1]. Could you please try it out by starting a local cluster with the start-local.sh script. If it fails, could you please send me all log files (client, jobmanager and taskmanager). Once we figured out why the JobCilent does not connect, we can try to tackle the BlobServer issue. Cheers, Till [1] https://github.com/tillrohrmann/flink/tree/fixJobClient On Thu, Mar 5, 2015 at 4:40 PM, Dulaj Viduranga vidura...@icloud.com wrote: Hi, The error message is, 21:06:01,521 WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable org.apache.flink.client.program.ProgramInvocationException: Could not build up connection to JobManager. at org.apache.flink.client.program.Client.run(Client.java:327) at org.apache.flink.client.program.Client.run(Client.java:306) at org.apache.flink.client.program.Client.run(Client.java:300) at
Re: Improve the documentation of the Flink Architecture and internals
I have put my suggested version of an outline for the docs into the wiki. Regardless where the docs end up (wiki or repository), we can use the wiki to outline the docs. https://cwiki.apache.org/confluence/display/FLINK/Flink+Internals Some pages contain some stub or outline, others are completely blank. Not a comple list. Additions are welcome. On Mon, Mar 16, 2015 at 10:04 PM, Stephan Ewen se...@apache.org wrote: I think the Wiki has a much lower barrier of entry to fix docs, especially for external people. The docs, with the Jekyll setup, is rather tricky. I would very much like that all kinds of people contribute to the docs about the internals, not just the usual three suspects that have done this so far. Having a good landing page in the regular docs is exactly to not loose all the people that do not look into a wiki. The overview pages for the internals need to be good and accessible and nicely link to the wiki to forward people there. The overhead of deciding what goes where should not be terribly large, in my opinion, since there is no really wrong place to put it. On Mon, Mar 16, 2015 at 9:58 PM, Aljoscha Krettek aljos...@apache.org wrote: Why do you wan't to split stuff between the doc in the repository and the wiki. I for one would always be to lazy to check stuff in a wiki when there is also a documentation. Plus, this would lead to additional overhead in deciding what goes where and syncing between the two places for documentation. On Mon, Mar 16, 2015 at 7:59 PM, Stephan Ewen se...@apache.org wrote: Ah, I totally forgot to add to the internals: - Fault tolerance in Batch mode - Fault Tolerance in Streaming Mode, with state handling On Mon, Mar 16, 2015 at 7:51 PM, Stephan Ewen se...@apache.org wrote: Hi all! I would like to kick of an effort to improve the documentation of the Flink Architecture and internals. This also means making the streaming architecture more prominent in the docs. Being quite a sophisticated stack, we need to improve the presentation of how Flink works - to an extend necessary to use Flink (and to appreciate all the cool stuff that is happening). This should also come in handy with new contributors. As a general umbrella, we need to first decide where and how to organize the documentation. I would propose to put the bulk of the documentation into the Wiki. Create a dedicated section on Flink Internals and sub-pages for each component / topic. To the docs, we add a general overview from which we link into the Wiki. == These sections would go into the DOCS in the git repository == - Overview of Program, pre-flight phase (type extraction, optimizer), JobManager, TaskManager. Differences between streaming and batch. We can realize this through one very nice picture with few lines of text. - High level architecture stack, different program representations (API operators, common API DAG, optimizer DAG, parallel data flow (JobGraph / Execution Graph) - (maybe) Parallelism and scheduling. This seems to be paramount to understand for users. - Processes (JobManager, TaskManager, Webserver, WebClient, CLI client) == These sections would go into the WIKI == - Project structure (maven projects, what is where, dependencies between projects) - Component overview - JobManager (InstanceManager, Scheduler, BLOB server, Library Cache, Archiving) - TaskManager (MemoryManager, IOManager, BLOB Cache, Library Cache) - Involved Actor Systems / Actors / Messages - Details about submitting a job (library upload, job graph submission, execution graph setup, scheduling trigger) - Memory Management - Optimizer internals - Akka Setup specifics - Netty and pluggable data exchange strategies - Testing: Flink test clusters and unit test utilities - Developer How-To: Setting up Eclipse, IntelliJ, Travis - Step-by-step guide to add a new operator I will go ahead and stub some sections in the Wiki. As we discuss and agree/disagree with the outline, we can evolve the Wiki. Greetings, Stephan
Re: Improve the documentation of the Flink Architecture and internals
I think the Wiki has a much lower barrier of entry to fix docs, especially for external people. The docs, with the Jekyll setup, is rather tricky. I would very much like that all kinds of people contribute to the docs about the internals, not just the usual three suspects that have done this so far. Having a good landing page in the regular docs is exactly to not loose all the people that do not look into a wiki. The overview pages for the internals need to be good and accessible and nicely link to the wiki to forward people there. The overhead of deciding what goes where should not be terribly large, in my opinion, since there is no really wrong place to put it. On Mon, Mar 16, 2015 at 9:58 PM, Aljoscha Krettek aljos...@apache.org wrote: Why do you wan't to split stuff between the doc in the repository and the wiki. I for one would always be to lazy to check stuff in a wiki when there is also a documentation. Plus, this would lead to additional overhead in deciding what goes where and syncing between the two places for documentation. On Mon, Mar 16, 2015 at 7:59 PM, Stephan Ewen se...@apache.org wrote: Ah, I totally forgot to add to the internals: - Fault tolerance in Batch mode - Fault Tolerance in Streaming Mode, with state handling On Mon, Mar 16, 2015 at 7:51 PM, Stephan Ewen se...@apache.org wrote: Hi all! I would like to kick of an effort to improve the documentation of the Flink Architecture and internals. This also means making the streaming architecture more prominent in the docs. Being quite a sophisticated stack, we need to improve the presentation of how Flink works - to an extend necessary to use Flink (and to appreciate all the cool stuff that is happening). This should also come in handy with new contributors. As a general umbrella, we need to first decide where and how to organize the documentation. I would propose to put the bulk of the documentation into the Wiki. Create a dedicated section on Flink Internals and sub-pages for each component / topic. To the docs, we add a general overview from which we link into the Wiki. == These sections would go into the DOCS in the git repository == - Overview of Program, pre-flight phase (type extraction, optimizer), JobManager, TaskManager. Differences between streaming and batch. We can realize this through one very nice picture with few lines of text. - High level architecture stack, different program representations (API operators, common API DAG, optimizer DAG, parallel data flow (JobGraph / Execution Graph) - (maybe) Parallelism and scheduling. This seems to be paramount to understand for users. - Processes (JobManager, TaskManager, Webserver, WebClient, CLI client) == These sections would go into the WIKI == - Project structure (maven projects, what is where, dependencies between projects) - Component overview - JobManager (InstanceManager, Scheduler, BLOB server, Library Cache, Archiving) - TaskManager (MemoryManager, IOManager, BLOB Cache, Library Cache) - Involved Actor Systems / Actors / Messages - Details about submitting a job (library upload, job graph submission, execution graph setup, scheduling trigger) - Memory Management - Optimizer internals - Akka Setup specifics - Netty and pluggable data exchange strategies - Testing: Flink test clusters and unit test utilities - Developer How-To: Setting up Eclipse, IntelliJ, Travis - Step-by-step guide to add a new operator I will go ahead and stub some sections in the Wiki. As we discuss and agree/disagree with the outline, we can evolve the Wiki. Greetings, Stephan
Re: Could not build up connection to JobManager
There was an issue for this: https://issues.apache.org/jira/browse/FLINK-1634 Can we close it then? On Sat, Mar 14, 2015 at 9:16 PM, Dulaj Viduranga vidura...@icloud.com wrote: Hay Stephan, Great to know you could fix the issue. Thank you on the update. Best regards. On Mar 14, 2015, at 9:19 PM, Stephan Ewen se...@apache.org wrote: Hey Dulaj! Forget what I said in the previous email. The issue with the wrong address binding seems to be solved now. There is another issue that the embedded taskmanager does not start properly, for whatever reason. My gut feeling is that there is something wrong There is a patch pending that changes the startup behavior to debug these situations much easier. I'll ping you as soon as that is in... Stephan On Sat, Mar 14, 2015 at 4:42 PM, Stephan Ewen se...@apache.org wrote: Hey Dulaj! One thing you can try is to add to the JVM startup options (in the scripts in the bin folder) the option -Djava.net.preferIPv4Stack=true and see if that helps it? Stephan On Sat, Mar 14, 2015 at 4:29 AM, Dulaj Viduranga vidura...@icloud.com wrote: Hi, Still this is no luck. I’ll upload the logs with configuration “localhost as well as “127.0.0.1” so you can take a look. 127.0.0.1 flink-Vidura-flink-client-localhost.log https://gist.github.com/viduranga/1d01149eee238158519e#file-flink-vidura-flink-client-localhost-log localhost flink-Vidura-flink-client-localhost.log https://gist.github.com/viduranga/d866c24c0ba566abab17#file-flink-vidura-flink-client-localhost-log flink-Vidura-jobmanager-localhost.log https://gist.github.com/viduranga/e7549ef818c6a2af73e9#file-flink-vidura-jobmanager-localhost-log On Mar 11, 2015, at 11:32 PM, Till Rohrmann trohrm...@apache.org wrote: Hi Dulaj, sorry for my late response. It looks as if the JobClient tries to connect to the JobManager using its IPv6 instead of IPv4. Akka is really picky when it comes to remote address. If Akka binds to the FQDN, then other ActorSystem which try to connect to it using its IP address won't be successful. I assume that this might be a problem. I tried to fix it. You can find it here [1]. Could you please try it out by starting a local cluster with the start-local.sh script. If it fails, could you please send me all log files (client, jobmanager and taskmanager). Once we figured out why the JobCilent does not connect, we can try to tackle the BlobServer issue. Cheers, Till [1] https://github.com/tillrohrmann/flink/tree/fixJobClient On Thu, Mar 5, 2015 at 4:40 PM, Dulaj Viduranga vidura...@icloud.com wrote: Hi, The error message is, 21:06:01,521 WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable org.apache.flink.client.program.ProgramInvocationException: Could not build up connection to JobManager. at org.apache.flink.client.program.Client.run(Client.java:327) at org.apache.flink.client.program.Client.run(Client.java:306) at org.apache.flink.client.program.Client.run(Client.java:300) at org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:55) at org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:82) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:483) at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:437) at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:353) at org.apache.flink.client.program.Client.run(Client.java:250) at org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:371) at org.apache.flink.client.CliFrontend.run(CliFrontend.java:344) at org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1087) at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1114) Caused by: java.io.IOException: JobManager at akka.tcp://flink@fe80 :0:0:0:742b:7f78:fab5:68e2%11:6123/user/jobmanager not reachable. Please make sure that the JobManager is running and its port is reachable. at org.apache.flink.runtime.jobmanager.JobManager$.getJobManagerRemoteReference(JobManager.scala:957) at org.apache.flink.runtime.client.JobClient$.createJobClient(JobClient.scala:151) at org.apache.flink.runtime.client.JobClient$.createJobClientFromConfig(JobClient.scala:142) at
Re: [DISCUSS] Offer Flink with Scala 2.11
Popping up this thread. The PR build passes and we need some sort of strategy (add suffix to packages or force manual build for 2.11) before we merge. 2015-03-11 19:15 GMT+01:00 Robert Metzger rmetz...@apache.org: Thank you for opening the pull request. It looks great so far. There came up one question while discussing Alex changes in the PR: To make this change really usable for our users, we need to provide our users with different poms for each artifact, containing either Scala 2.10 or 2.11. Its basically the same thing we did for Hadoop's dependencies. We would then end up with the versions: 0.9.0 and 0.9.0-hadoop1 and artifacts like flink-core and flink-core_2.11. So each module will generate 4 artifacts to maven central. Another option would be to keep everything as it is right now, then users would need to set the properties correctly when building their flink projects. On Wed, Mar 11, 2015 at 12:41 AM, Alexander Alexandrov alexander.s.alexand...@gmail.com wrote: The PR is here: https://github.com/apache/flink/pull/477 Cheers! 2015-03-10 18:07 GMT+01:00 Alexander Alexandrov alexander.s.alexand...@gmail.com: Yes, will do. 2015-03-10 16:39 GMT+01:00 Robert Metzger rmetz...@apache.org: Very nice work. The changes are probably somewhat easy to merge. Except for the version properties in the parent pom, there should not be a bigger issue. Can you also add additional build profiles to travis for scala 2.11 ? On Tue, Mar 10, 2015 at 2:50 PM, Alexander Alexandrov alexander.s.alexand...@gmail.com wrote: We have is almost ready here: https://github.com/stratosphere/flink/commits/scala_2.11_rebased I wanted to open a PR today 2015-03-10 11:28 GMT+01:00 Robert Metzger rmetz...@apache.org: Hey Alex, I don't know the exact status of the Scala 2.11 integration. But I wanted to point you to https://github.com/apache/flink/pull/454, which is changing a huge portion of our maven build infrastructure. If you haven't started yet, it might make sense to base your integration onto that pull request. Otherwise, let me know if you have troubles rebasing your changes. On Mon, Mar 2, 2015 at 9:13 PM, Chiwan Park chiwanp...@icloud.com wrote: +1 for Scala 2.11 Regards. Chiwan Park (Sent with iPhone) On Mar 3, 2015, at 2:43 AM, Robert Metzger rmetz...@apache.org wrote: I'm +1 if this doesn't affect existing Scala 2.10 users. I would also suggest to add a scala 2.11 build to travis as well to ensure everything is working with the different Hadoop/JVM versions. It shouldn't be a big deal to offer scala_version x hadoop_version builds for newer releases. You only need to add more builds here: https://github.com/apache/flink/blob/master/tools/create_release_files.sh#L131 On Mon, Mar 2, 2015 at 6:17 PM, Till Rohrmann trohrm...@apache.org wrote: +1 for Scala 2.11 On Mon, Mar 2, 2015 at 5:02 PM, Alexander Alexandrov alexander.s.alexand...@gmail.com wrote: Spark currently only provides pre-builds for 2.10 and requires custom build for 2.11. Not sure whether this is the best idea, but I can see the benefits from a project management point of view... Would you prefer to have a {scala_version} × {hadoop_version} integrated on the website? 2015-03-02 16:57 GMT+01:00 Aljoscha Krettek aljos...@apache.org: +1 I also like it. We just have to figure out how we can publish two sets of release artifacts. On Mon, Mar 2, 2015 at 4:48 PM, Stephan Ewen se...@apache.org wrote: Big +1 from my side! Does it have to be a Maven profile, or does a maven property work? (Profile may be needed for quasiquotes dependency?) On Mon, Mar 2, 2015 at 4:36 PM, Alexander Alexandrov alexander.s.alexand...@gmail.com wrote: Hi there, since I'm relying on Scala 2.11.4 on a project I've been working on, I created a branch which updates the Scala version used by Flink from 2.10.4 to 2.11.4: https://github.com/stratosphere/flink/commits/scala_2.11 Everything seems to work fine and the PR contains minor changes compared to Spark: https://issues.apache.org/jira/browse/SPARK-4466 If you're interested, I can rewrite this as a Maven Profile and open a PR so people can build Flink with 2.11 support. I suggest to do this sooner rather than later in order to