[jira] [Resolved] (FLINK-1385) Add option to YARN client to disable resource availability checks
[ https://issues.apache.org/jira/browse/FLINK-1385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Metzger resolved FLINK-1385. --- Resolution: Fixed Fix Version/s: 0.9 Resolved in http://git-wip-us.apache.org/repos/asf/flink/commit/8f8efe2f Add option to YARN client to disable resource availability checks - Key: FLINK-1385 URL: https://issues.apache.org/jira/browse/FLINK-1385 Project: Flink Issue Type: Bug Components: YARN Client Affects Versions: 0.8, 0.9 Reporter: Robert Metzger Assignee: Robert Metzger Fix For: 0.9 The YARN client checks which resources are available in the cluster at the time of the initial deployment. If somebody wants to deploy a Flink YARN session to a full YARN cluster, the Flink Client will reject the session. I'm going to add an option which disables the check. The YARN session will then allocate new containers as they become available in the cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1421) Implement a SAMOA Adapter for Flink Streaming
[ https://issues.apache.org/jira/browse/FLINK-1421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289486#comment-14289486 ] Gianmarco De Francisci Morales commented on FLINK-1421: --- Indeed, having some form of feedback between Processors is quite common is many algorithms, unless they are embarrassingly parallel (e.g., ensemble methods such as Bagging). We are also trying to simplify the API and remove a lot of legacy code. Hopefully, you'll find it easier to test new datasets from real streams very soon. Implement a SAMOA Adapter for Flink Streaming - Key: FLINK-1421 URL: https://issues.apache.org/jira/browse/FLINK-1421 Project: Flink Issue Type: New Feature Components: Streaming Reporter: Paris Carbone Assignee: Paris Carbone Original Estimate: 336h Remaining Estimate: 336h Yahoo's Samoa is an experimental incremental machine learning library that builds on an abstract compositional data streaming model to write streaming algorithms. The task is to provide an adapter from SAMOA topologies to Flink-streaming job graphs in order to support Flink as a backend engine for SAMOA tasks. A statup guide can be viewed here : https://docs.google.com/document/d/18glDJDYmnFGT1UGtZimaxZpGeeg1Ch14NgDoymhPk2A/pub The main working branch of the adapter : https://github.com/senorcarbone/samoa/tree/flink -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1297) Add support for tracking statistics of intermediate results
[ https://issues.apache.org/jira/browse/FLINK-1297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289511#comment-14289511 ] Fridtjof Sander commented on FLINK-1297: I started to prototype the design for this and pushed my first early result into the FLINK-1297 branch in our fork: https://github.com/stratosphere/flink/tree/FLINK-1297 Add support for tracking statistics of intermediate results --- Key: FLINK-1297 URL: https://issues.apache.org/jira/browse/FLINK-1297 Project: Flink Issue Type: Improvement Components: Distributed Runtime Reporter: Alexander Alexandrov Assignee: Alexander Alexandrov Fix For: 0.9 Original Estimate: 1,008h Remaining Estimate: 1,008h One of the major problems related to the optimizer at the moment is the lack of proper statistics. With the introduction of staged execution, it is possible to instrument the runtime code with a statistics facility that collects the required information for optimizing the next execution stage. I would therefore like to contribute code that can be used to gather basic statistics for the (intermediate) result of dataflows (e.g. min, max, count, count distinct) and make them available to the job manager. Before I start, I would like to hear some feedback form the other users. In particular, to handle skew (e.g. on grouping) it might be good to have some sort of detailed sketch about the key distribution of an intermediate result. I am not sure whether a simple histogram is the most effective way to go. Maybe somebody would propose another lightweight sketch that provides better accuracy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (FLINK-1295) Add option to Flink client to start a YARN session per job
[ https://issues.apache.org/jira/browse/FLINK-1295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Metzger resolved FLINK-1295. --- Resolution: Fixed Fix Version/s: 0.9 Resolved in http://git-wip-us.apache.org/repos/asf/flink/commit/8f8efe2f Add option to Flink client to start a YARN session per job -- Key: FLINK-1295 URL: https://issues.apache.org/jira/browse/FLINK-1295 Project: Flink Issue Type: New Feature Components: YARN Client Reporter: Robert Metzger Assignee: Robert Metzger Fix For: 0.9 Currently, Flink users can only launch Flink on YARN as a YARN session (meaning a long-running YARN application that can run multiple Flink jobs) Users have requested to extend the Flink Client to allocate YARN containers only for executing a single job. As part of this pull request, I would suggest to refactor the YARN Client to make it more modular and object oriented. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1201) Graph API for Flink
[ https://issues.apache.org/jira/browse/FLINK-1201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289564#comment-14289564 ] ASF GitHub Bot commented on FLINK-1201: --- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/326#issuecomment-71230743 Ah, sorry, I forgot that, let me do that in a few hours. I am sort of having fun with git right now :-) FYI: What I basically used is the git filter-branch command, using a tree filter and a `mv src/* flink-addons/flink-gelly/src` filter command. Some commits before and after to make sure the directories exist and then, after the relocation, removing them from the history. Graph API for Flink Key: FLINK-1201 URL: https://issues.apache.org/jira/browse/FLINK-1201 Project: Flink Issue Type: New Feature Reporter: Kostas Tzoumas Assignee: Vasia Kalavri This issue tracks the development of a Graph API/DSL for Flink. Until the code is pushed to the Flink repository, collaboration is happening here: https://github.com/project-flink/flink-graph -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1381) Only one output splitter supported per operator
[ https://issues.apache.org/jira/browse/FLINK-1381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289834#comment-14289834 ] ASF GitHub Bot commented on FLINK-1381: --- GitHub user qmlmoon opened a pull request: https://github.com/apache/flink/pull/332 [FLINK-1381] Allow multiple output splitters for single stream operator You can merge this pull request into a Git repository by running: $ git pull https://github.com/qmlmoon/incubator-flink stream-split Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/332.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #332 commit cff3eb0747f17185c4d71cf5117696d28f879964 Author: mingliang qmlm...@gmail.com Date: 2015-01-23T09:59:02Z [FLINK-1381] Allow multiple output splitters for single stream operator Only one output splitter supported per operator --- Key: FLINK-1381 URL: https://issues.apache.org/jira/browse/FLINK-1381 Project: Flink Issue Type: Bug Components: Streaming Affects Versions: 0.9 Reporter: Gyula Fora Priority: Minor Currently the streaming api only supports output splitting once per operator. The splitting logic should be reworked to allow any number of it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-629) Add support for null values to the java api
[ https://issues.apache.org/jira/browse/FLINK-629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289753#comment-14289753 ] mustafa elbehery commented on FLINK-629: [~rmetzger] I need this fix into release-0.8 maven repo, I have cherry picked the fix into 0.8 branch, would I push it and you merge it ? Add support for null values to the java api --- Key: FLINK-629 URL: https://issues.apache.org/jira/browse/FLINK-629 Project: Flink Issue Type: Improvement Components: Java API Reporter: Stephan Ewen Assignee: Gyula Fora Priority: Critical Labels: github-import Fix For: pre-apache Attachments: Selection_006.png, SimpleTweetInputFormat.java, Tweet.java, model.tar.gz Currently, many runtime operations fail when encountering a null value. Tuple serialization should allow null fields. I suggest to add a method to the tuples called `getFieldNotNull()` which throws a meaningful exception when the accessed field is null. That way, we simplify the logic of operators that should not dead with null fields, like key grouping or aggregations. Even though SQL allows grouping and aggregating of null values, I suggest to exclude this from the java api, because the SQL semantics of aggregating null fields are messy. Imported from GitHub Url: https://github.com/stratosphere/stratosphere/issues/629 Created by: [StephanEwen|https://github.com/StephanEwen] Labels: enhancement, java api, Milestone: Release 0.5.1 Created at: Wed Mar 26 00:27:49 CET 2014 State: open -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1342) Quickstart's assembly can possibly filter out user's code
[ https://issues.apache.org/jira/browse/FLINK-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289913#comment-14289913 ] Fabian Hueske commented on FLINK-1342: -- This is also true for all dependencies which are from the o.a.f namespace. For example flink-hadoop-compatibility is not added to ./lib and not in Flink's class path by default. Programs which depend on the hadoop-compat module and which are built with the quick start Maven config won't run unless the hadoop-compat jar is manually added to ./lib. I'd suggest to only filter those dependencies which are present in ./lib even though this might end up in a rather bulky blacklist. Quickstart's assembly can possibly filter out user's code - Key: FLINK-1342 URL: https://issues.apache.org/jira/browse/FLINK-1342 Project: Flink Issue Type: Bug Affects Versions: 0.9 Reporter: Márton Balassi Fix For: 0.9, 0.8.1 I've added a quick solution for [1] for the time being. The assembly still filters out everything from the org.apache.flink namespace, so any user code placed there will be missing from the fat jar. If we do not use filtering at all the size of the jar goes up to almost 100 MB. [1] https://issues.apache.org/jira/browse/FLINK-1225 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1201] Add flink-gelly to flink-addons
Github user cebe commented on the pull request: https://github.com/apache/flink/pull/326#issuecomment-71268972 that really sounds like great fun :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1352] [runtime] Fix buggy registration ...
Github user hsaputra commented on a diff in the pull request: https://github.com/apache/flink/pull/328#discussion_r23485615 --- Diff: flink-runtime/src/main/scala/org/apache/flink/runtime/taskmanager/TaskManager.scala --- @@ -175,62 +175,79 @@ import scala.collection.JavaConverters._ } private def tryJobManagerRegistration(): Unit = { -registrationAttempts = 0 -import context.dispatcher -registrationScheduler = Some(context.system.scheduler.schedule( - TaskManager.REGISTRATION_DELAY, TaskManager.REGISTRATION_INTERVAL, - self, RegisterAtJobManager)) +registrationDuration = 0 seconds + +registered = false + +context.system.scheduler.scheduleOnce(registrationDelay, self, RegisterAtJobManager) } override def receiveWithLogMessages: Receive = { case RegisterAtJobManager = { - registrationAttempts += 1 + if(!registered) { +registrationDuration += registrationDelay +// double delay for exponential backoff +registrationDelay *= 2 - if (registered) { -registrationScheduler.foreach(_.cancel()) - } - else if (registrationAttempts = TaskManager.MAX_REGISTRATION_ATTEMPTS) { +if (registrationDuration maxRegistrationDuration) { + log.warning(TaskManager could not register at JobManager {} after {}., jobManagerAkkaURL, -log.info(Try to register at master {}. Attempt #{}, jobManagerAkkaURL, - registrationAttempts) -val jobManager = context.actorSelection(jobManagerAkkaURL) +maxRegistrationDuration) -jobManager ! RegisterTaskManager(connectionInfo, hardwareDescription, numberOfSlots) - } - else { -log.error(TaskManager could not register at JobManager.); -self ! PoisonPill + self ! PoisonPill +} else if (!registered) { + log.info(sTry to register at master ${jobManagerAkkaURL}. ${registrationAttempts}. + +sAttempt) + val jobManager = context.actorSelection(jobManagerAkkaURL) + + jobManager ! RegisterTaskManager(connectionInfo, hardwareDescription, numberOfSlots) + + context.system.scheduler.scheduleOnce(registrationDelay, self, RegisterAtJobManager) +} } } case AcknowledgeRegistration(id, blobPort) = { - if (!registered) { + if(!registered) { +finishRegistration(id, blobPort) registered = true -currentJobManager = sender -instanceID = id - -context.watch(currentJobManager) - -log.info(TaskManager successfully registered at JobManager {}., - currentJobManager.path.toString) - -setupNetworkEnvironment() -setupLibraryCacheManager(blobPort) + } else { +if (log.isDebugEnabled) { + log.debug(The TaskManager {} is already registered at the JobManager {}, but received + +another AcknowledgeRegistration message., self.path, currentJobManager.path) +} + } +} -heartbeatScheduler = Some(context.system.scheduler.schedule( - TaskManager.HEARTBEAT_INTERVAL, TaskManager.HEARTBEAT_INTERVAL, self, SendHeartbeat)) +case AlreadyRegistered(id, blobPort) = + if(!registered) { +log.warning(The TaskManager {} seems to be already registered at the JobManager {} even + + though it has not yet finished the registration process., self.path, sender.path) -profiler foreach { - _.tell(RegisterProfilingListener, JobManager.getProfiler(currentJobManager)) +finishRegistration(id, blobPort) +registered = true + } else { +// ignore AlreadyRegistered messages which arrived after AcknowledgeRegistration +if(log.isDebugEnabled){ + log.debug(The TaskManager {} has already been registered at the JobManager {}., +self.path, sender.path) } + } -for (listener - waitForRegistration) { - listener ! RegisteredAtJobManager -} +case RefuseRegistration(reason) = + if(!registered) { +log.error(The registration of task manager {} was refused by the job manager {} + + because {}., self.path, jobManagerAkkaURL, reason) -waitForRegistration.clear() +// Shut task manager down +self ! PoisonPill + } else { +// ignore RefuseRegistration messages which arrived after AcknowledgeRegistration +if(log.isDebugEnabled) { --- End diff -- This is
[jira] [Resolved] (FLINK-1406) Windows compatibility
[ https://issues.apache.org/jira/browse/FLINK-1406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fabian Hueske resolved FLINK-1406. -- Resolution: Fixed Fix Version/s: 0.9 Fixed in 96585f149c8d100f3d8fce2bb737743695f20c12 Windows compatibility - Key: FLINK-1406 URL: https://issues.apache.org/jira/browse/FLINK-1406 Project: Flink Issue Type: Improvement Reporter: Max Michels Priority: Minor Fix For: 0.9 Attachments: flink_1406.patch The documentation [1] states: Flink runs on all UNIX-like environments: Linux, Mac OS X, Cygwin. The only requirement is to have a working Java 6.x (or higher) installation. I just found out Flink runs also natively on Windows. Do we want to support Windows? If so, we should update the documentation. Clearly, we don't support it at the moment for development. At multiple places, the tests contain references to /tmp or /dev/random/ which are not Windows compatible. Probably it's enough to update the documentation stating that Flink runs on Windows but cannot be build or developed on Windows without Cygwin. [1] http://flink.apache.org/docs/0.7-incubating/setup_quickstart.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-1344) Add implicit conversion from scala streams to DataStreams
[ https://issues.apache.org/jira/browse/FLINK-1344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paris Carbone updated FLINK-1344: - Description: Source definitions in the scala-api pass a collector to the UDF, thus enforcing an imperative style for defining custom streams. To encourage a purely functional coding style in the streaming scala-api while also adding some interoperability with scala constructs (ie. Seq and Streams) it would be nice to add an implicit conversion from Seq[T] to DataStream[T]. (An upcoming idea would be for sinks to also support wrapping up flink streams to scala streams for full interoperability with scala streaming code.) was: Source definitions in the scala-api pass a collector to the UDF, thus enforcing an imperative style for defining custom streams. In order maintain a purely functional coding style in the streaming scala-api while also adding some interoperability with scala constructs (ie. Seq and Streams) [UPDATE] Since there is already a fromCollection(Seq) function we can simply add an implicit conversion from Seq[T] to DataStream[T]. (An additional idea would be for sinks could also support wrapping up flink streams to scala streams for full interoperability with scala streaming code.) Add implicit conversion from scala streams to DataStreams - Key: FLINK-1344 URL: https://issues.apache.org/jira/browse/FLINK-1344 Project: Flink Issue Type: New Feature Components: Streaming Reporter: Paris Carbone Assignee: Paris Carbone Priority: Trivial Source definitions in the scala-api pass a collector to the UDF, thus enforcing an imperative style for defining custom streams. To encourage a purely functional coding style in the streaming scala-api while also adding some interoperability with scala constructs (ie. Seq and Streams) it would be nice to add an implicit conversion from Seq[T] to DataStream[T]. (An upcoming idea would be for sinks to also support wrapping up flink streams to scala streams for full interoperability with scala streaming code.) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1352] [runtime] Fix buggy registration ...
Github user hsaputra commented on a diff in the pull request: https://github.com/apache/flink/pull/328#discussion_r23485570 --- Diff: flink-runtime/src/main/scala/org/apache/flink/runtime/taskmanager/TaskManager.scala --- @@ -175,62 +175,79 @@ import scala.collection.JavaConverters._ } private def tryJobManagerRegistration(): Unit = { -registrationAttempts = 0 -import context.dispatcher -registrationScheduler = Some(context.system.scheduler.schedule( - TaskManager.REGISTRATION_DELAY, TaskManager.REGISTRATION_INTERVAL, - self, RegisterAtJobManager)) +registrationDuration = 0 seconds + +registered = false + +context.system.scheduler.scheduleOnce(registrationDelay, self, RegisterAtJobManager) } override def receiveWithLogMessages: Receive = { case RegisterAtJobManager = { - registrationAttempts += 1 + if(!registered) { +registrationDuration += registrationDelay +// double delay for exponential backoff +registrationDelay *= 2 - if (registered) { -registrationScheduler.foreach(_.cancel()) - } - else if (registrationAttempts = TaskManager.MAX_REGISTRATION_ATTEMPTS) { +if (registrationDuration maxRegistrationDuration) { + log.warning(TaskManager could not register at JobManager {} after {}., jobManagerAkkaURL, -log.info(Try to register at master {}. Attempt #{}, jobManagerAkkaURL, - registrationAttempts) -val jobManager = context.actorSelection(jobManagerAkkaURL) +maxRegistrationDuration) -jobManager ! RegisterTaskManager(connectionInfo, hardwareDescription, numberOfSlots) - } - else { -log.error(TaskManager could not register at JobManager.); -self ! PoisonPill + self ! PoisonPill +} else if (!registered) { + log.info(sTry to register at master ${jobManagerAkkaURL}. ${registrationAttempts}. + +sAttempt) + val jobManager = context.actorSelection(jobManagerAkkaURL) + + jobManager ! RegisterTaskManager(connectionInfo, hardwareDescription, numberOfSlots) + + context.system.scheduler.scheduleOnce(registrationDelay, self, RegisterAtJobManager) +} } } case AcknowledgeRegistration(id, blobPort) = { - if (!registered) { + if(!registered) { +finishRegistration(id, blobPort) registered = true -currentJobManager = sender -instanceID = id - -context.watch(currentJobManager) - -log.info(TaskManager successfully registered at JobManager {}., - currentJobManager.path.toString) - -setupNetworkEnvironment() -setupLibraryCacheManager(blobPort) + } else { +if (log.isDebugEnabled) { --- End diff -- Small nit, with slf4j formatting we do not need to check isDebugEnabled anymore because it uses parameterized messages feature that check for it before materialize the string. It will the keep the code cleaner =) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Updated] (FLINK-1344) Add implicit conversion from scala streams to DataStreams
[ https://issues.apache.org/jira/browse/FLINK-1344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paris Carbone updated FLINK-1344: - Summary: Add implicit conversion from scala streams to DataStreams (was: Add implicit conversions from scala streams to DataStreams) Add implicit conversion from scala streams to DataStreams - Key: FLINK-1344 URL: https://issues.apache.org/jira/browse/FLINK-1344 Project: Flink Issue Type: New Feature Components: Streaming Reporter: Paris Carbone Assignee: Paris Carbone Priority: Trivial Source definitions in the scala-api pass a collector to the UDF, thus enforcing an imperative style for defining custom streams. In order maintain a purely functional coding style in the streaming scala-api while also adding some interoperability with scala constructs (ie. Seq and Streams) [UPDATE] Since there is already a fromCollection(Seq) function we can simply add an implicit conversion from Seq[T] to DataStream[T]. (An additional idea would be for sinks could also support wrapping up flink streams to scala streams for full interoperability with scala streaming code.) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-1438) ClassCastException for Custom InputSplit in local mode
Fabian Hueske created FLINK-1438: Summary: ClassCastException for Custom InputSplit in local mode Key: FLINK-1438 URL: https://issues.apache.org/jira/browse/FLINK-1438 Project: Flink Issue Type: Bug Components: JobManager Affects Versions: 0.8 Reporter: Fabian Hueske Priority: Minor Jobs with custom InputSplits fail with a ClassCastException such as {{org.apache.flink.examples.java.misc.CustomSplitTestJob$TestFileInputSplit cannot be cast to org.apache.flink.examples.java.misc.CustomSplitTestJob$TestFileInputSplit}} if executed on a local setup. This issue is probably related to different ClassLoaders used by the JobManager when InputSplits are generated and when they are handed to the InputFormat by the TaskManager. Moving the class of the custom InputSplit into the {{./lib}} folder and removing it from the job's makes the job work. To reproduce the bug, run the following job on a local setup. {code} public class CustomSplitTestJob { public static void main(String[] args) throws Exception { ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); DataSetString x = env.createInput(new TestFileInputFormat()); x.print(); env.execute(); } public static class TestFileInputFormat implements InputFormatString,TestFileInputSplit { @Override public void configure(Configuration parameters) { } @Override public BaseStatistics getStatistics(BaseStatistics cachedStatistics) throws IOException { return null; } @Override public TestFileInputSplit[] createInputSplits(int minNumSplits) throws IOException { return new TestFileInputSplit[]{new TestFileInputSplit()}; } @Override public InputSplitAssigner getInputSplitAssigner(TestFileInputSplit[] inputSplits) { return new LocatableInputSplitAssigner(inputSplits); } @Override public void open(TestFileInputSplit split) throws IOException { } @Override public boolean reachedEnd() throws IOException { return false; } @Override public String nextRecord(String reuse) throws IOException { return null; } @Override public void close() throws IOException { } } public static class TestFileInputSplit extends FileInputSplit { } } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1168] Adds multi-char field delimiter s...
Github user fhueske commented on the pull request: https://github.com/apache/flink/pull/264#issuecomment-71285644 That's a good point. I copied the code from the DelimitedInputFormat which allows to specify the charset for the record delimiter. So if we go with 1. we should also fix the DelimitedIF to UTF-8. 2. we need to extend the StringValue which does not support charsets. 1) would be the easy fix. Charset support could be added later. Any objections with going for 1)? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1168) Support multi-character field delimiters in CSVInputFormats
[ https://issues.apache.org/jira/browse/FLINK-1168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290218#comment-14290218 ] ASF GitHub Bot commented on FLINK-1168: --- Github user fhueske commented on the pull request: https://github.com/apache/flink/pull/264#issuecomment-71285644 That's a good point. I copied the code from the DelimitedInputFormat which allows to specify the charset for the record delimiter. So if we go with 1. we should also fix the DelimitedIF to UTF-8. 2. we need to extend the StringValue which does not support charsets. 1) would be the easy fix. Charset support could be added later. Any objections with going for 1)? Support multi-character field delimiters in CSVInputFormats --- Key: FLINK-1168 URL: https://issues.apache.org/jira/browse/FLINK-1168 Project: Flink Issue Type: Improvement Affects Versions: 0.7.0-incubating Reporter: Fabian Hueske Assignee: Manu Kaul Priority: Minor Labels: starter The CSVInputFormat supports multi-char (String) line delimiters, but only single-char (char) field delimiters. This issue proposes to add support for multi-char field delimiters. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-1440) Missing plan visualizer image for http://flink.apache.org/docs/0.8/programming_guide.html page
Henry Saputra created FLINK-1440: Summary: Missing plan visualizer image for http://flink.apache.org/docs/0.8/programming_guide.html page Key: FLINK-1440 URL: https://issues.apache.org/jira/browse/FLINK-1440 Project: Flink Issue Type: Bug Components: Project Website Reporter: Henry Saputra In the http://flink.apache.org/docs/0.8/programming_guide.html page looks like we are missing http://flink.apache.org/docs/0.8/img/plan_visualizer2.png file. I could not find it anywhere in the source. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1353] Fixes the Execution to use the co...
Github user tillrohrmann commented on the pull request: https://github.com/apache/flink/pull/325#issuecomment-71170205 I'll merge it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1353) Execution always uses DefaultAkkaAskTimeout, rather than the configured value
[ https://issues.apache.org/jira/browse/FLINK-1353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289029#comment-14289029 ] ASF GitHub Bot commented on FLINK-1353: --- Github user tillrohrmann commented on the pull request: https://github.com/apache/flink/pull/325#issuecomment-71170205 I'll merge it. Execution always uses DefaultAkkaAskTimeout, rather than the configured value - Key: FLINK-1353 URL: https://issues.apache.org/jira/browse/FLINK-1353 Project: Flink Issue Type: Bug Components: JobManager Affects Versions: 0.9 Reporter: Stephan Ewen Assignee: Till Rohrmann Fix For: 0.9 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (FLINK-1436) Command-line interface verbose option (-v)
[ https://issues.apache.org/jira/browse/FLINK-1436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Michels reassigned FLINK-1436: -- Assignee: Max Michels Command-line interface verbose option (-v) -- Key: FLINK-1436 URL: https://issues.apache.org/jira/browse/FLINK-1436 Project: Flink Issue Type: Improvement Components: Start-Stop Scripts Reporter: Max Michels Assignee: Max Michels Priority: Trivial Labels: starter, usability Let me run just a basic Flink job and add the verbose flag. It's a general option, so let me add it as a first parameter: ./flink -v run ../examples/flink-java-examples-0.8.0-WordCount.jar hdfs:///input hdfs:///output9 Invalid action! ./flink ACTION [GENERAL_OPTIONS] [ARGUMENTS] general options: -h,--help Show the help for the CLI Frontend. -v,--verbose Print more detailed error messages. Action run compiles and runs a program. Syntax: run [OPTIONS] jar-file arguments run action arguments: -c,--class classname Class with the program entry point (main method or getPlan() method. Only needed if the JAR file does not specify the class in its manifest. -m,--jobmanager host:port Address of the JobManager (master) to which to connect. Use this flag to connect to a different JobManager than the one specified in the configuration. -p,--parallelism parallelism The parallelism with which to run the program. Optional flag to override the default value specified in the configuration. Action info displays information about a program. info action arguments: -c,--class classname Class with the program entry point (main method or getPlan() method. Only needed if the JAR file does not specify the class in its manifest. -e,--executionplan Show optimized execution plan of the program (JSON) -m,--jobmanager host:port Address of the JobManager (master) to which to connect. Use this flag to connect to a different JobManager than the one specified in the configuration. -p,--parallelism parallelism The parallelism with which to run the program. Optional flag to override the default value specified in the configuration. Action list lists running and finished programs. list action arguments: -m,--jobmanager host:port Address of the JobManager (master) to which to connect. Use this flag to connect to a different JobManager than the one specified in the configuration. -r,--running Show running programs and their JobIDs -s,--scheduledShow scheduled prorgrams and their JobIDs Action cancel cancels a running program. cancel action arguments: -i,--jobid jobIDJobID of program to cancel -m,--jobmanager host:port Address of the JobManager (master) to which to connect. Use this flag to connect to a different JobManager than the one specified in the configuration. What just happened? This results in a lot of output which is usually generated if you use the --help option on command-line tools. If your terminal window is large enough, then you will see a tiny message: Please specify an action. I did specify an action. Strange. If you read the help messages carefully you see, that general options belong to the action. ./flink run -v ../examples/flink-java-examples-0.8.0-WordCount.jar hdfs:///input hdfs:///output9 For the sake of mitigating user frustration, let us also accept -v as the first argument. It may seem trivial for the day-to-day Flink user but makes a difference for a novice. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1353) Execution always uses DefaultAkkaAskTimeout, rather than the configured value
[ https://issues.apache.org/jira/browse/FLINK-1353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289043#comment-14289043 ] ASF GitHub Bot commented on FLINK-1353: --- Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/325 Execution always uses DefaultAkkaAskTimeout, rather than the configured value - Key: FLINK-1353 URL: https://issues.apache.org/jira/browse/FLINK-1353 Project: Flink Issue Type: Bug Components: JobManager Affects Versions: 0.9 Reporter: Stephan Ewen Assignee: Till Rohrmann Fix For: 0.9 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-1353) Execution always uses DefaultAkkaAskTimeout, rather than the configured value
[ https://issues.apache.org/jira/browse/FLINK-1353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann closed FLINK-1353. Resolution: Fixed Fixed with a5702b69ed7794abdd44035581077b09fc551450. Execution always uses DefaultAkkaAskTimeout, rather than the configured value - Key: FLINK-1353 URL: https://issues.apache.org/jira/browse/FLINK-1353 Project: Flink Issue Type: Bug Components: JobManager Affects Versions: 0.9 Reporter: Stephan Ewen Assignee: Till Rohrmann Fix For: 0.9 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1353] Fixes the Execution to use the co...
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/325 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1352) Buggy registration from TaskManager to JobManager
[ https://issues.apache.org/jira/browse/FLINK-1352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289085#comment-14289085 ] ASF GitHub Bot commented on FLINK-1352: --- Github user tillrohrmann commented on the pull request: https://github.com/apache/flink/pull/328#issuecomment-71177177 Yeah you're right that the exponential backoff was the default behaviour before. I think Stephan's proposal is the best solution. I'll implement it and also change the Akka messages so that in the future TaskManager can be refused by the JobManager, if needed. Buggy registration from TaskManager to JobManager - Key: FLINK-1352 URL: https://issues.apache.org/jira/browse/FLINK-1352 Project: Flink Issue Type: Bug Components: JobManager, TaskManager Affects Versions: 0.9 Reporter: Stephan Ewen Assignee: Till Rohrmann Fix For: 0.9 The JobManager's InstanceManager may refuse the registration attempt from a TaskManager, because it has this taskmanager already connected, or,in the future, because the TaskManager has been blacklisted as unreliable. Unpon refused registration, the instance ID is null, to signal that refused registration. TaskManager reacts incorrectly to such methods, assuming successful registration Possible solution: JobManager sends back a dedicated RegistrationRefused message, if the instance manager returns null as the registration result. If the TastManager receives that before being registered, it knows that the registration response was lost (which should not happen on TCP and it would indicate a corrupt connection) Followup question: Does it make sense to have the TaskManager trying indefinitely to connect to the JobManager. With increasing interval (from seconds to minutes)? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1389] Allow changing the filenames of t...
Github user rmetzger commented on the pull request: https://github.com/apache/flink/pull/301#issuecomment-71178244 Hadoops `FileOutputFormat` has the following method ```java public static String getUniqueName(JobConf conf, String name) { int partition = conf.getInt(JobContext.TASK_PARTITION, -1); if (partition == -1) { throw new IllegalArgumentException( This method can only be called from within a Job); } String taskType = (conf.getBoolean(JobContext.TASK_ISMAP, true)) ? m : r; NumberFormat numberFormat = NumberFormat.getInstance(); numberFormat.setMinimumIntegerDigits(5); numberFormat.setGroupingUsed(false); return name + - + taskType + - + numberFormat.format(partition); } ``` So users have to overwrite the method if they want to use a custom filename. I'll remove the configuration-based approach. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1425) Turn lazy operator execution off for streaming programs
[ https://issues.apache.org/jira/browse/FLINK-1425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289273#comment-14289273 ] ASF GitHub Bot commented on FLINK-1425: --- GitHub user uce opened a pull request: https://github.com/apache/flink/pull/330 [FLINK-1425] [streaming] Add scheduling of all tasks at once See corresponding [discussion](http://mail-archives.apache.org/mod_mbox/incubator-flink-dev/201501.mbox/%3CCA%2Bfaj9yHzsNDW-6hUVdoL_OY0EoVMRrEO-6fWjyD9A_oJsxvnA%40mail.gmail.com%3E) on the mailing list. You can merge this pull request into a Git repository by running: $ git pull https://github.com/uce/incubator-flink schedule-all Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/330.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #330 commit 1f82e0fe8eed1947f95aa2e5d276a1432171a187 Author: Ufuk Celebi u...@apache.org Date: 2015-01-23T10:30:30Z [FLINK-1425] [streaming] Add scheduling of all tasks at once Turn lazy operator execution off for streaming programs --- Key: FLINK-1425 URL: https://issues.apache.org/jira/browse/FLINK-1425 Project: Flink Issue Type: Task Components: Streaming Affects Versions: 0.9 Reporter: Gyula Fora Assignee: Ufuk Celebi Streaming programs currently use the same lazy operator execution model as batch programs. This makes the functionality of some operators like time based windowing very awkward, since they start computing windows based on the start of the operator. Also, one should expect for streaming programs to run continuously so there is not much to gain from lazy execution. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-1436) Command-line interface verbose option error reporting
[ https://issues.apache.org/jira/browse/FLINK-1436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Michels updated FLINK-1436: --- Summary: Command-line interface verbose option error reporting (was: Command-line interface verbose option (-v)) Command-line interface verbose option error reporting --- Key: FLINK-1436 URL: https://issues.apache.org/jira/browse/FLINK-1436 Project: Flink Issue Type: Improvement Components: Start-Stop Scripts Reporter: Max Michels Assignee: Max Michels Priority: Trivial Labels: starter, usability Let me run just a basic Flink job and add the verbose flag. It's a general option, so let me add it as a first parameter: ./flink -v run ../examples/flink-java-examples-0.8.0-WordCount.jar hdfs:///input hdfs:///output9 Invalid action! ./flink ACTION [GENERAL_OPTIONS] [ARGUMENTS] general options: -h,--help Show the help for the CLI Frontend. -v,--verbose Print more detailed error messages. Action run compiles and runs a program. Syntax: run [OPTIONS] jar-file arguments run action arguments: -c,--class classname Class with the program entry point (main method or getPlan() method. Only needed if the JAR file does not specify the class in its manifest. -m,--jobmanager host:port Address of the JobManager (master) to which to connect. Use this flag to connect to a different JobManager than the one specified in the configuration. -p,--parallelism parallelism The parallelism with which to run the program. Optional flag to override the default value specified in the configuration. Action info displays information about a program. info action arguments: -c,--class classname Class with the program entry point (main method or getPlan() method. Only needed if the JAR file does not specify the class in its manifest. -e,--executionplan Show optimized execution plan of the program (JSON) -m,--jobmanager host:port Address of the JobManager (master) to which to connect. Use this flag to connect to a different JobManager than the one specified in the configuration. -p,--parallelism parallelism The parallelism with which to run the program. Optional flag to override the default value specified in the configuration. Action list lists running and finished programs. list action arguments: -m,--jobmanager host:port Address of the JobManager (master) to which to connect. Use this flag to connect to a different JobManager than the one specified in the configuration. -r,--running Show running programs and their JobIDs -s,--scheduledShow scheduled prorgrams and their JobIDs Action cancel cancels a running program. cancel action arguments: -i,--jobid jobIDJobID of program to cancel -m,--jobmanager host:port Address of the JobManager (master) to which to connect. Use this flag to connect to a different JobManager than the one specified in the configuration. What just happened? This results in a lot of output which is usually generated if you use the --help option on command-line tools. If your terminal window is large enough, then you will see a tiny message: Please specify an action. I did specify an action. Strange. If you read the help messages carefully you see, that general options belong to the action. ./flink run -v ../examples/flink-java-examples-0.8.0-WordCount.jar hdfs:///input hdfs:///output9 For the sake of mitigating user frustration, let us also accept -v as the first argument. It may seem trivial for the day-to-day Flink user but makes a difference for a novice. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1436) Command-line interface verbose option (-v)
[ https://issues.apache.org/jira/browse/FLINK-1436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289416#comment-14289416 ] Max Michels commented on FLINK-1436: https://github.com/apache/flink/pull/331 Command-line interface verbose option (-v) -- Key: FLINK-1436 URL: https://issues.apache.org/jira/browse/FLINK-1436 Project: Flink Issue Type: Improvement Components: Start-Stop Scripts Reporter: Max Michels Assignee: Max Michels Priority: Trivial Labels: starter, usability Let me run just a basic Flink job and add the verbose flag. It's a general option, so let me add it as a first parameter: ./flink -v run ../examples/flink-java-examples-0.8.0-WordCount.jar hdfs:///input hdfs:///output9 Invalid action! ./flink ACTION [GENERAL_OPTIONS] [ARGUMENTS] general options: -h,--help Show the help for the CLI Frontend. -v,--verbose Print more detailed error messages. Action run compiles and runs a program. Syntax: run [OPTIONS] jar-file arguments run action arguments: -c,--class classname Class with the program entry point (main method or getPlan() method. Only needed if the JAR file does not specify the class in its manifest. -m,--jobmanager host:port Address of the JobManager (master) to which to connect. Use this flag to connect to a different JobManager than the one specified in the configuration. -p,--parallelism parallelism The parallelism with which to run the program. Optional flag to override the default value specified in the configuration. Action info displays information about a program. info action arguments: -c,--class classname Class with the program entry point (main method or getPlan() method. Only needed if the JAR file does not specify the class in its manifest. -e,--executionplan Show optimized execution plan of the program (JSON) -m,--jobmanager host:port Address of the JobManager (master) to which to connect. Use this flag to connect to a different JobManager than the one specified in the configuration. -p,--parallelism parallelism The parallelism with which to run the program. Optional flag to override the default value specified in the configuration. Action list lists running and finished programs. list action arguments: -m,--jobmanager host:port Address of the JobManager (master) to which to connect. Use this flag to connect to a different JobManager than the one specified in the configuration. -r,--running Show running programs and their JobIDs -s,--scheduledShow scheduled prorgrams and their JobIDs Action cancel cancels a running program. cancel action arguments: -i,--jobid jobIDJobID of program to cancel -m,--jobmanager host:port Address of the JobManager (master) to which to connect. Use this flag to connect to a different JobManager than the one specified in the configuration. What just happened? This results in a lot of output which is usually generated if you use the --help option on command-line tools. If your terminal window is large enough, then you will see a tiny message: Please specify an action. I did specify an action. Strange. If you read the help messages carefully you see, that general options belong to the action. ./flink run -v ../examples/flink-java-examples-0.8.0-WordCount.jar hdfs:///input hdfs:///output9 For the sake of mitigating user frustration, let us also accept -v as the first argument. It may seem trivial for the day-to-day Flink user but makes a difference for a novice. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (FLINK-1421) Implement a SAMOA Adapter for Flink Streaming
[ https://issues.apache.org/jira/browse/FLINK-1421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289468#comment-14289468 ] Paris Carbone edited comment on FLINK-1421 at 1/23/15 4:29 PM: --- :) Samoa is a really interesting project, we can currently create flink datastreams from samoa topologies, just in cases where there are no cyclic dependencies. Apparently, cycles are quite common in Samoa tasks so we will support this very soon through Flink iterations. I bet the performance will also be comparatively good as well. I am really curious to run some cross-platform benchmarks as well. was (Author: senorcarbone): :) Samoa is a really interesting project, we can currently create flink datastreams from samoa topologies, just in cases where there are no cyclic dependencies. Apparently, cycles are quite common in Samoa tasks so we will support this very soon through Flink iterations. Implement a SAMOA Adapter for Flink Streaming - Key: FLINK-1421 URL: https://issues.apache.org/jira/browse/FLINK-1421 Project: Flink Issue Type: New Feature Components: Streaming Reporter: Paris Carbone Assignee: Paris Carbone Original Estimate: 336h Remaining Estimate: 336h Yahoo's Samoa is an experimental incremental machine learning library that builds on an abstract compositional data streaming model to write streaming algorithms. The task is to provide an adapter from SAMOA topologies to Flink-streaming job graphs in order to support Flink as a backend engine for SAMOA tasks. A statup guide can be viewed here : https://docs.google.com/document/d/18glDJDYmnFGT1UGtZimaxZpGeeg1Ch14NgDoymhPk2A/pub The main working branch of the adapter : https://github.com/senorcarbone/samoa/tree/flink -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1201] Add flink-gelly to flink-addons
Github user rmetzger commented on the pull request: https://github.com/apache/flink/pull/326#issuecomment-71204417 Yes. The package seems to be still wrong. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Updated] (FLINK-1433) Add HADOOP_CLASSPATH to start scripts
[ https://issues.apache.org/jira/browse/FLINK-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Metzger updated FLINK-1433: -- Fix Version/s: 0.8.1 Add HADOOP_CLASSPATH to start scripts - Key: FLINK-1433 URL: https://issues.apache.org/jira/browse/FLINK-1433 Project: Flink Issue Type: Improvement Reporter: Robert Metzger Fix For: 0.8.1 With the Hadoop file system wrapper, its important to have access to the hadoop filesystem classes. The HADOOP_CLASSPATH seems to be a standard environment variable used by Hadoop for such libraries. Deployments like Google Compute Cloud set this variable containing the Google Cloud Storage Hadoop Wrapper. So if users want to use the Cloud Storage in an non-yarn environment, we need to address this issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1389) Allow setting custom file extensions for files created by the FileOutputFormat
[ https://issues.apache.org/jira/browse/FLINK-1389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289366#comment-14289366 ] ASF GitHub Bot commented on FLINK-1389: --- Github user rmetzger commented on the pull request: https://github.com/apache/flink/pull/301#issuecomment-71208053 I've updated the pull request. Please review again, I would like to merge this soon. Allow setting custom file extensions for files created by the FileOutputFormat -- Key: FLINK-1389 URL: https://issues.apache.org/jira/browse/FLINK-1389 Project: Flink Issue Type: New Feature Reporter: Robert Metzger Assignee: Robert Metzger Priority: Minor A user requested the ability to name avro files with the avro extension. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1427) Configuration through environment variables
[ https://issues.apache.org/jira/browse/FLINK-1427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289427#comment-14289427 ] Ufuk Celebi commented on FLINK-1427: You could easily check whether the variable is already set and print a warning. It's just a matter of what we want to give precedence to. I agree that it is error prone to configure the same thing in different places (although I think that Alexander was referring to something else). Since no user really complained so far, we should stick to your non intrusive proposal and essentially keep everything as is. Configuration through environment variables --- Key: FLINK-1427 URL: https://issues.apache.org/jira/browse/FLINK-1427 Project: Flink Issue Type: Improvement Components: Local Runtime Environment: Deployment Reporter: Max Michels Priority: Minor Labels: configuration, deployment Like Hadoop or Spark, etc. Flink should support configuration via shell environment variables. In cluster setups, this makes things a lot easier because writing config files can be omitted. Many automation tools (e.g. Google's bdutil) use (or abuse) this feature. For example, to set up the task manager heap size, we would run `export FLINK_TASKMANAGER_HEAP=4096` before starting the task manager on a node to set the heap memory size to 4096MB. Environment variables should overwrite the regular config entries. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1421) Implement a SAMOA Adapter for Flink Streaming
[ https://issues.apache.org/jira/browse/FLINK-1421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289468#comment-14289468 ] Paris Carbone commented on FLINK-1421: -- :) Samoa is a really interesting project, we can currently create flink datastreams from samoa topologies, just in cases where there are no cyclic dependencies. Apparently, cycles are quite common in Samoa tasks so we will support this very soon through Flink iterations. Implement a SAMOA Adapter for Flink Streaming - Key: FLINK-1421 URL: https://issues.apache.org/jira/browse/FLINK-1421 Project: Flink Issue Type: New Feature Components: Streaming Reporter: Paris Carbone Assignee: Paris Carbone Original Estimate: 336h Remaining Estimate: 336h Yahoo's Samoa is an experimental incremental machine learning library that builds on an abstract compositional data streaming model to write streaming algorithms. The task is to provide an adapter from SAMOA topologies to Flink-streaming job graphs in order to support Flink as a backend engine for SAMOA tasks. A statup guide can be viewed here : https://docs.google.com/document/d/18glDJDYmnFGT1UGtZimaxZpGeeg1Ch14NgDoymhPk2A/pub The main working branch of the adapter : https://github.com/senorcarbone/samoa/tree/flink -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [Flink-1436] refactor CLiFrontend to provide m...
Github user rmetzger commented on a diff in the pull request: https://github.com/apache/flink/pull/331#discussion_r23455785 --- Diff: flink-clients/src/main/java/org/apache/flink/client/CliFrontend.java --- @@ -960,14 +960,15 @@ public int parseParameters(String[] args) { // check for action if (args.length 1) { - System.out.println(Please specify an action.); printHelp(); + System.out.println(); + System.out.println(Please specify an action.); --- End diff -- instead of adding an additional call each time for the new-line, why aren't you pre-pending a '\n' to the string? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [Flink-1436] refactor CLiFrontend to provide m...
Github user rmetzger commented on a diff in the pull request: https://github.com/apache/flink/pull/331#discussion_r23455746 --- Diff: flink-clients/src/main/java/org/apache/flink/client/CliFrontend.java --- @@ -945,13 +948,10 @@ private int handleError(Throwable t) { private void evaluateGeneralOptions(CommandLine line) { // check help flag this.printHelp = line.hasOption(HELP_OPTION.getOpt()); - - // check verbosity flag - this.verbose = line.hasOption(VERBOSE_OPTION.getOpt()); } /** -* Parses the command line arguments and starts the requested action. +* Parses the command lin e arguments and starts the requested action. --- End diff -- typo --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [Flink-1436] refactor CLiFrontend to provide m...
GitHub user mxm opened a pull request: https://github.com/apache/flink/pull/331 [Flink-1436] refactor CLiFrontend to provide more identifiable and meaningful error messages You can merge this pull request into a Git repository by running: $ git pull https://github.com/mxm/flink flink-1436 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/331.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #331 commit 1bb326c347ce26b0458a4ed437e418a227820ed4 Author: Max m...@posteo.de Date: 2015-01-23T13:05:31Z [FLINK-1436] more meaningful error messages commit 5daf545384b1a2dfbed5273c691804ed45e211c1 Author: Max m...@posteo.de Date: 2015-01-23T13:05:55Z [FLINK-1436] remove verbose flag and default to verbose output commit 4cc1f691df57a0f411419283522952ba13a4d713 Author: Max m...@posteo.de Date: 2015-01-23T14:31:07Z [FLINK-1436] rework error message logic using checked exceptions in particular, error messages are printed after usage information which provides information about the error immediately (no more scrolling) commit ab0bfa0fbd854facfbee6f74a601be975ea268b3 Author: Max m...@posteo.de Date: 2015-01-23T14:39:40Z [FLINK-1436] adapt tests for the new error handling of CliFrontend --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1436) Command-line interface verbose option (-v)
[ https://issues.apache.org/jira/browse/FLINK-1436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289426#comment-14289426 ] ASF GitHub Bot commented on FLINK-1436: --- Github user rmetzger commented on the pull request: https://github.com/apache/flink/pull/331#issuecomment-71212582 Testing JIRA integration FLINK-1436 Lets see Command-line interface verbose option (-v) -- Key: FLINK-1436 URL: https://issues.apache.org/jira/browse/FLINK-1436 Project: Flink Issue Type: Improvement Components: Start-Stop Scripts Reporter: Max Michels Assignee: Max Michels Priority: Trivial Labels: starter, usability Let me run just a basic Flink job and add the verbose flag. It's a general option, so let me add it as a first parameter: ./flink -v run ../examples/flink-java-examples-0.8.0-WordCount.jar hdfs:///input hdfs:///output9 Invalid action! ./flink ACTION [GENERAL_OPTIONS] [ARGUMENTS] general options: -h,--help Show the help for the CLI Frontend. -v,--verbose Print more detailed error messages. Action run compiles and runs a program. Syntax: run [OPTIONS] jar-file arguments run action arguments: -c,--class classname Class with the program entry point (main method or getPlan() method. Only needed if the JAR file does not specify the class in its manifest. -m,--jobmanager host:port Address of the JobManager (master) to which to connect. Use this flag to connect to a different JobManager than the one specified in the configuration. -p,--parallelism parallelism The parallelism with which to run the program. Optional flag to override the default value specified in the configuration. Action info displays information about a program. info action arguments: -c,--class classname Class with the program entry point (main method or getPlan() method. Only needed if the JAR file does not specify the class in its manifest. -e,--executionplan Show optimized execution plan of the program (JSON) -m,--jobmanager host:port Address of the JobManager (master) to which to connect. Use this flag to connect to a different JobManager than the one specified in the configuration. -p,--parallelism parallelism The parallelism with which to run the program. Optional flag to override the default value specified in the configuration. Action list lists running and finished programs. list action arguments: -m,--jobmanager host:port Address of the JobManager (master) to which to connect. Use this flag to connect to a different JobManager than the one specified in the configuration. -r,--running Show running programs and their JobIDs -s,--scheduledShow scheduled prorgrams and their JobIDs Action cancel cancels a running program. cancel action arguments: -i,--jobid jobIDJobID of program to cancel -m,--jobmanager host:port Address of the JobManager (master) to which to connect. Use this flag to connect to a different JobManager than the one specified in the configuration. What just happened? This results in a lot of output which is usually generated if you use the --help option on command-line tools. If your terminal window is large enough, then you will see a tiny message: Please specify an action. I did specify an action. Strange. If you read the help messages carefully you see, that general options belong to the action. ./flink run -v ../examples/flink-java-examples-0.8.0-WordCount.jar hdfs:///input hdfs:///output9 For the sake of mitigating user frustration, let us also accept -v as the first argument. It may seem trivial for the day-to-day Flink user but makes a difference for a novice. -- This