[jira] [Commented] (FLINK-5668) Reduce dependency on HDFS at job startup time
[ https://issues.apache.org/jira/browse/FLINK-5668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15886404#comment-15886404 ] Haohui Mai commented on FLINK-5668: --- [~bill.liu8904] I believe that the title of the jira is confusing. Can you summarize what you want to achieve. Adding a PR will also help. [~rmetzger] regarding this issue, do you have an idea why the current implementation writes the configuration into a file on {{default.FS}}? What do you think if passing the configuration through the ``taskManagerEnv``? Any downside for this approach? > Reduce dependency on HDFS at job startup time > - > > Key: FLINK-5668 > URL: https://issues.apache.org/jira/browse/FLINK-5668 > Project: Flink > Issue Type: Improvement > Components: YARN >Reporter: Bill Liu > Original Estimate: 48h > Remaining Estimate: 48h > > When create a Flink cluster on Yarn, JobManager depends on HDFS to share > taskmanager-conf.yaml with TaskManager. > It's better to share the taskmanager-conf.yaml on JobManager Web server > instead of HDFS, which could reduce the HDFS dependency at job startup. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-5668) Reduce dependency on HDFS at job startup time
[ https://issues.apache.org/jira/browse/FLINK-5668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15886342#comment-15886342 ] Haohui Mai commented on FLINK-5668: --- [~rmetzger] -- just want to clarify FLINK-5631 here. YARN downloads the resources from the specified paths and localizes the resources on worker nodes. Note that the {{Path}} class in the Hadoop APIs supports specifying filesystem other than the one specified in {{default.FS}}. For example, {{new Path(URI.create("s3a://foo")}} specifies the a resource on S3, regardless what {{default.FS}} is specified. FLINK-5631 enables YARN to localize resources that are not stored on {{default.FS}}. > Reduce dependency on HDFS at job startup time > - > > Key: FLINK-5668 > URL: https://issues.apache.org/jira/browse/FLINK-5668 > Project: Flink > Issue Type: Improvement > Components: YARN >Reporter: Bill Liu > Original Estimate: 48h > Remaining Estimate: 48h > > When create a Flink cluster on Yarn, JobManager depends on HDFS to share > taskmanager-conf.yaml with TaskManager. > It's better to share the taskmanager-conf.yaml on JobManager Web server > instead of HDFS, which could reduce the HDFS dependency at job startup. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-5668) Reduce dependency on HDFS at job startup time
[ https://issues.apache.org/jira/browse/FLINK-5668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15886220#comment-15886220 ] Bill Liu commented on FLINK-5668: - [~rmetzger] [~wheat9]] and I are working on implementing a flink job deployer for a Yarn with `HttpFs` and `S3`. The Yarn Container could resolve the `http/s3` file scheme. We use `HttpFs` instead of `HDFS` to bootstrap the JobManager Here is the code to set up the AM container (JobManager) ``` Path resourcePath = new Path("http://localhost:19989/flink-dist.jar;) FileStatus fileStatus = resourcePath.getFileSystem(yarnConfiguration) .getFileStatus(resourcePath); LOG.info("resource {}", ConverterUtils.getYarnUrlFromPath(resourcePath)); LocalResource packageResource = LocalResource.newInstance( ConverterUtils.getYarnUrlFromPath(resourcePath), LocalResourceType.FILE, LocalResourceVisibility.APPLICATION, fileStatus.getLen(), fileStatus.getModificationTime()); LOG.info("add localresource {}", packageResource); localResources.put("flink.jar", packageResource); amContainer.setLocalResources(localResources); ``` `yarn.deploy.fs` is not a goog idea, because these bootstrap jars/files may be located on different filesystem. It's better to parse the jar Path to get the underneath filesystem of jar. > Reduce dependency on HDFS at job startup time > - > > Key: FLINK-5668 > URL: https://issues.apache.org/jira/browse/FLINK-5668 > Project: Flink > Issue Type: Improvement > Components: YARN >Reporter: Bill Liu > Original Estimate: 48h > Remaining Estimate: 48h > > When create a Flink cluster on Yarn, JobManager depends on HDFS to share > taskmanager-conf.yaml with TaskManager. > It's better to share the taskmanager-conf.yaml on JobManager Web server > instead of HDFS, which could reduce the HDFS dependency at job startup. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-5668) Reduce dependency on HDFS at job startup time
[ https://issues.apache.org/jira/browse/FLINK-5668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15885456#comment-15885456 ] Robert Metzger commented on FLINK-5668: --- Sorry that I did not look at this JIRA earlier. [~bill.liu8904] and [~wheat9] if I understand you correctly, you want Flink on YARN not to use Hadoop's {{fs.defaultFS}} configuration for choosing the filesystem used to distribute jars and configuration files during deployment? This basically means that we need to provide a custom configuration key in Flink (something like {{yarn.deploy.fs}}) to put the stuff to. This would allow you so use s3 for deploying Flink and hdfs for rocksdb or other state backups. I'm not able to completely understand FLINK-5631: How can I register an additional jar from a different file system as a required resource? > Reduce dependency on HDFS at job startup time > - > > Key: FLINK-5668 > URL: https://issues.apache.org/jira/browse/FLINK-5668 > Project: Flink > Issue Type: Improvement > Components: YARN >Reporter: Bill Liu > Original Estimate: 48h > Remaining Estimate: 48h > > When create a Flink cluster on Yarn, JobManager depends on HDFS to share > taskmanager-conf.yaml with TaskManager. > It's better to share the taskmanager-conf.yaml on JobManager Web server > instead of HDFS, which could reduce the HDFS dependency at job startup. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-5668) Reduce dependency on HDFS at job startup time
[ https://issues.apache.org/jira/browse/FLINK-5668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15885434#comment-15885434 ] ASF GitHub Bot commented on FLINK-5668: --- Github user rmetzger commented on the issue: https://github.com/apache/flink/pull/3413 I would like to discuss the implementation approach in the [JIRA](https://issues.apache.org/jira/browse/FLINK-5668) before reviewing this pull request. > Reduce dependency on HDFS at job startup time > - > > Key: FLINK-5668 > URL: https://issues.apache.org/jira/browse/FLINK-5668 > Project: Flink > Issue Type: Improvement > Components: YARN >Reporter: Bill Liu > Original Estimate: 48h > Remaining Estimate: 48h > > When create a Flink cluster on Yarn, JobManager depends on HDFS to share > taskmanager-conf.yaml with TaskManager. > It's better to share the taskmanager-conf.yaml on JobManager Web server > instead of HDFS, which could reduce the HDFS dependency at job startup. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-5668) Reduce dependency on HDFS at job startup time
[ https://issues.apache.org/jira/browse/FLINK-5668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15883521#comment-15883521 ] ASF GitHub Bot commented on FLINK-5668: --- GitHub user billliuatuber opened a pull request: https://github.com/apache/flink/pull/3413 [FLINK-5668] Reduce dependency on HDFS at job startup time In current implementation, Job manager writes task manager configuration into a file and upload it to HDFS. This file's used to bootstrap taskmanager. In this PR, it switches to use system environment instead of HDFS file to pass the configuration from job manager to task manager, which reduce the dependency on HDFS. You can merge this pull request into a Git repository by running: $ git pull https://github.com/billliuatuber/flink FLINK-5668 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/3413.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3413 commit 547b333203db8e01737a690e06ad5f5663d7faca Author: Bill LiuDate: 2017-02-21T03:27:46Z [FLINK-5668] reduce hdfs dependency at startup time > Reduce dependency on HDFS at job startup time > - > > Key: FLINK-5668 > URL: https://issues.apache.org/jira/browse/FLINK-5668 > Project: Flink > Issue Type: Improvement > Components: YARN >Reporter: Bill Liu > Original Estimate: 48h > Remaining Estimate: 48h > > When create a Flink cluster on Yarn, JobManager depends on HDFS to share > taskmanager-conf.yaml with TaskManager. > It's better to share the taskmanager-conf.yaml on JobManager Web server > instead of HDFS, which could reduce the HDFS dependency at job startup. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-5668) Reduce dependency on HDFS at job startup time
[ https://issues.apache.org/jira/browse/FLINK-5668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15849138#comment-15849138 ] Bill Liu commented on FLINK-5668: - thanks [~wheat9] for filling the full contexts. YARN's own fault tolerance and high availability relies on HDFS , but It doesn't mean Flink-on-Yarn has to depend on HDFS. Especially some of the HDFS dependency is not necessary at all. For the taskmanager configuration file, I take a deep look at the code, The taskmaster-config is cloned from baseConfig and then made a very slitty change on it. ``` final Configuration taskManagerConfig = BootstrapTools.generateTaskManagerConfiguration( config, akkaHostname, akkaPort, slotsPerTaskManager, TASKMANAGER_REGISTRATION_TIMEOUT); public static Configuration generateTaskManagerConfiguration( Configuration baseConfig, String jobManagerHostname, int jobManagerPort, int numSlots, FiniteDuration registrationTimeout) { Configuration cfg = baseConfig.clone(); cfg.setString(ConfigConstants.JOB_MANAGER_IPC_ADDRESS_KEY, jobManagerHostname); cfg.setInteger(ConfigConstants.JOB_MANAGER_IPC_PORT_KEY, jobManagerPort); cfg.setString(ConfigConstants.TASK_MANAGER_MAX_REGISTRATION_DURATION, registrationTimeout.toString()); if (numSlots != -1){ cfg.setInteger(ConfigConstants.TASK_MANAGER_NUM_TASK_SLOTS, numSlots); } return cfg; } ``` If JobManager web server is not a good place to share files, jobmanager don't need create a local taskmanager-config.yaml at all, it could just pass the the base config file and some dynamic properties to override the value in base config. > Reduce dependency on HDFS at job startup time > - > > Key: FLINK-5668 > URL: https://issues.apache.org/jira/browse/FLINK-5668 > Project: Flink > Issue Type: Improvement > Components: YARN >Reporter: Bill Liu > Original Estimate: 48h > Remaining Estimate: 48h > > When create a Flink cluster on Yarn, JobManager depends on HDFS to share > taskmanager-conf.yaml with TaskManager. > It's better to share the taskmanager-conf.yaml on JobManager Web server > instead of HDFS, which could reduce the HDFS dependency at job startup. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-5668) Reduce dependency on HDFS at job startup time
[ https://issues.apache.org/jira/browse/FLINK-5668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15849048#comment-15849048 ] Haohui Mai commented on FLINK-5668: --- Please allow me to fill in some of the contexts here. The request is to have Flink support alternative filesystems (e.g., S3) in Flink-on-YARN so that our mission critical jobs can survive unavailability of HDFS. Flink-on-YARN still depends on the underlying distributed file systems to implement high availability and reliability requirements. This jira has no intentions of changing the current mechanisms in Flink. You are right on that YARN itself depends on a distributed file system to function correctly. It works well with HDFS, but in general it also works with any filesystems that implement the `FileSystem` API in Hadoop. There are multiple deployments in production that run YARN on S3. Essentially we would like to take the approach of FLINK-5631 in a more comprehensive way -- in many places the Flink-on-YARN implementation simply takes the default file system from YARN. In fact the {{Path}} objects specify the filesystem, it would be great to teach Flink to recognize the {{Path}} objects properly just as what FLINK-5631 has done, so that it becomes possible to run Flink-on-YARN on alternative filesystems such as S3. Does it make sense to you [~StephanEwen]? > Reduce dependency on HDFS at job startup time > - > > Key: FLINK-5668 > URL: https://issues.apache.org/jira/browse/FLINK-5668 > Project: Flink > Issue Type: Improvement > Components: YARN >Reporter: Bill Liu > Original Estimate: 48h > Remaining Estimate: 48h > > When create a Flink cluster on Yarn, JobManager depends on HDFS to share > taskmanager-conf.yaml with TaskManager. > It's better to share the taskmanager-conf.yaml on JobManager Web server > instead of HDFS, which could reduce the HDFS dependency at job startup. -- This message was sent by Atlassian JIRA (v6.3.15#6346)