[jira] [Commented] (FLINK-5668) Reduce dependency on HDFS at job startup time

2017-02-27 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15886404#comment-15886404
 ] 

Haohui Mai commented on FLINK-5668:
---

[~bill.liu8904] I believe that the title of the jira is confusing. Can you 
summarize what you want to achieve. Adding a PR will also help.

[~rmetzger] regarding this issue, do you have an idea why the current 
implementation writes the configuration into a file on {{default.FS}}? What do 
you think if passing the configuration through the ``taskManagerEnv``? Any 
downside for this approach?


> Reduce dependency on HDFS at job startup time
> -
>
> Key: FLINK-5668
> URL: https://issues.apache.org/jira/browse/FLINK-5668
> Project: Flink
>  Issue Type: Improvement
>  Components: YARN
>Reporter: Bill Liu
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> When create a Flink cluster on Yarn,  JobManager depends on  HDFS to share  
> taskmanager-conf.yaml  with TaskManager.
> It's better to share the taskmanager-conf.yaml  on JobManager Web server 
> instead of HDFS, which could reduce the HDFS dependency  at job startup.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (FLINK-5668) Reduce dependency on HDFS at job startup time

2017-02-27 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15886342#comment-15886342
 ] 

Haohui Mai commented on FLINK-5668:
---

[~rmetzger] -- just want to clarify FLINK-5631 here.

YARN downloads the resources from the specified paths and localizes the 
resources on worker nodes. Note that the {{Path}} class in the Hadoop APIs 
supports specifying filesystem other than the one specified in {{default.FS}}. 
For example, {{new Path(URI.create("s3a://foo")}} specifies the a resource on 
S3, regardless what {{default.FS}} is specified. FLINK-5631 enables YARN to 
localize resources that are not stored on {{default.FS}}.



> Reduce dependency on HDFS at job startup time
> -
>
> Key: FLINK-5668
> URL: https://issues.apache.org/jira/browse/FLINK-5668
> Project: Flink
>  Issue Type: Improvement
>  Components: YARN
>Reporter: Bill Liu
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> When create a Flink cluster on Yarn,  JobManager depends on  HDFS to share  
> taskmanager-conf.yaml  with TaskManager.
> It's better to share the taskmanager-conf.yaml  on JobManager Web server 
> instead of HDFS, which could reduce the HDFS dependency  at job startup.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (FLINK-5668) Reduce dependency on HDFS at job startup time

2017-02-27 Thread Bill Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15886220#comment-15886220
 ] 

Bill Liu commented on FLINK-5668:
-

[~rmetzger] 
 [~wheat9]] and I are working on implementing a flink job deployer  for a Yarn 
with `HttpFs` and `S3`.
The Yarn Container could resolve the `http/s3`  file scheme. 

We use `HttpFs` instead of `HDFS` to bootstrap the JobManager
Here is the code to set up the AM container (JobManager)
```
Path resourcePath = new Path("http://localhost:19989/flink-dist.jar;)
FileStatus fileStatus = resourcePath.getFileSystem(yarnConfiguration)
.getFileStatus(resourcePath);
LOG.info("resource {}", ConverterUtils.getYarnUrlFromPath(resourcePath));
LocalResource packageResource =
LocalResource.newInstance(
ConverterUtils.getYarnUrlFromPath(resourcePath),
LocalResourceType.FILE, LocalResourceVisibility.APPLICATION,
fileStatus.getLen(), fileStatus.getModificationTime());
LOG.info("add localresource {}", packageResource);
localResources.put("flink.jar", packageResource);
   amContainer.setLocalResources(localResources);
```
`yarn.deploy.fs`  is not a goog idea, because these bootstrap jars/files may be 
located on different filesystem.
It's better to parse the jar Path to get the underneath filesystem of jar.


> Reduce dependency on HDFS at job startup time
> -
>
> Key: FLINK-5668
> URL: https://issues.apache.org/jira/browse/FLINK-5668
> Project: Flink
>  Issue Type: Improvement
>  Components: YARN
>Reporter: Bill Liu
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> When create a Flink cluster on Yarn,  JobManager depends on  HDFS to share  
> taskmanager-conf.yaml  with TaskManager.
> It's better to share the taskmanager-conf.yaml  on JobManager Web server 
> instead of HDFS, which could reduce the HDFS dependency  at job startup.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (FLINK-5668) Reduce dependency on HDFS at job startup time

2017-02-27 Thread Robert Metzger (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15885456#comment-15885456
 ] 

Robert Metzger commented on FLINK-5668:
---

Sorry that I did not look at this JIRA earlier.

[~bill.liu8904] and [~wheat9] if I understand you correctly, you want Flink on 
YARN not to use Hadoop's {{fs.defaultFS}} configuration for choosing the 
filesystem used to distribute jars and configuration files during deployment?

This basically means that we need to provide a custom configuration key in 
Flink (something like {{yarn.deploy.fs}}) to put the stuff to. This would allow 
you so use s3 for deploying Flink and hdfs for rocksdb or other state backups.

I'm not able to completely understand FLINK-5631: How can I register an 
additional jar from a different file system as a required resource?

> Reduce dependency on HDFS at job startup time
> -
>
> Key: FLINK-5668
> URL: https://issues.apache.org/jira/browse/FLINK-5668
> Project: Flink
>  Issue Type: Improvement
>  Components: YARN
>Reporter: Bill Liu
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> When create a Flink cluster on Yarn,  JobManager depends on  HDFS to share  
> taskmanager-conf.yaml  with TaskManager.
> It's better to share the taskmanager-conf.yaml  on JobManager Web server 
> instead of HDFS, which could reduce the HDFS dependency  at job startup.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (FLINK-5668) Reduce dependency on HDFS at job startup time

2017-02-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15885434#comment-15885434
 ] 

ASF GitHub Bot commented on FLINK-5668:
---

Github user rmetzger commented on the issue:

https://github.com/apache/flink/pull/3413
  
I would like to discuss the implementation approach in the 
[JIRA](https://issues.apache.org/jira/browse/FLINK-5668) before reviewing this 
pull request.


> Reduce dependency on HDFS at job startup time
> -
>
> Key: FLINK-5668
> URL: https://issues.apache.org/jira/browse/FLINK-5668
> Project: Flink
>  Issue Type: Improvement
>  Components: YARN
>Reporter: Bill Liu
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> When create a Flink cluster on Yarn,  JobManager depends on  HDFS to share  
> taskmanager-conf.yaml  with TaskManager.
> It's better to share the taskmanager-conf.yaml  on JobManager Web server 
> instead of HDFS, which could reduce the HDFS dependency  at job startup.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (FLINK-5668) Reduce dependency on HDFS at job startup time

2017-02-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15883521#comment-15883521
 ] 

ASF GitHub Bot commented on FLINK-5668:
---

GitHub user billliuatuber opened a pull request:

https://github.com/apache/flink/pull/3413

[FLINK-5668] Reduce dependency on HDFS at job startup time

In current implementation, Job manager writes task manager configuration 
into a file and upload it to HDFS. This file's used to bootstrap taskmanager.

In this PR, it switches to use system environment instead of HDFS file to 
pass the configuration from job manager to task manager, which reduce the 
dependency on HDFS. 
  

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/billliuatuber/flink FLINK-5668

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/3413.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3413


commit 547b333203db8e01737a690e06ad5f5663d7faca
Author: Bill Liu 
Date:   2017-02-21T03:27:46Z

[FLINK-5668] reduce hdfs dependency at startup time




> Reduce dependency on HDFS at job startup time
> -
>
> Key: FLINK-5668
> URL: https://issues.apache.org/jira/browse/FLINK-5668
> Project: Flink
>  Issue Type: Improvement
>  Components: YARN
>Reporter: Bill Liu
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> When create a Flink cluster on Yarn,  JobManager depends on  HDFS to share  
> taskmanager-conf.yaml  with TaskManager.
> It's better to share the taskmanager-conf.yaml  on JobManager Web server 
> instead of HDFS, which could reduce the HDFS dependency  at job startup.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (FLINK-5668) Reduce dependency on HDFS at job startup time

2017-02-01 Thread Bill Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15849138#comment-15849138
 ] 

Bill Liu commented on FLINK-5668:
-

thanks [~wheat9] for filling the full contexts.
YARN's own fault tolerance and high availability relies on HDFS , but It 
doesn't mean Flink-on-Yarn has to depend on HDFS.
Especially some of the HDFS dependency is not necessary at all.
For the taskmanager configuration file, 
I take a deep look at the code, 
The taskmaster-config is cloned from baseConfig and then made a very slitty 
change on it.
```
final Configuration taskManagerConfig = 
BootstrapTools.generateTaskManagerConfiguration(
config, akkaHostname, akkaPort, 
slotsPerTaskManager, TASKMANAGER_REGISTRATION_TIMEOUT);

public static Configuration generateTaskManagerConfiguration(
Configuration baseConfig,
String jobManagerHostname,
int jobManagerPort,
int numSlots,
FiniteDuration registrationTimeout) {

Configuration cfg = baseConfig.clone();

cfg.setString(ConfigConstants.JOB_MANAGER_IPC_ADDRESS_KEY, 
jobManagerHostname);
cfg.setInteger(ConfigConstants.JOB_MANAGER_IPC_PORT_KEY, 
jobManagerPort);

cfg.setString(ConfigConstants.TASK_MANAGER_MAX_REGISTRATION_DURATION, 
registrationTimeout.toString());
if (numSlots != -1){

cfg.setInteger(ConfigConstants.TASK_MANAGER_NUM_TASK_SLOTS, numSlots);
}

return cfg; 
}
``` 

If JobManager web server is not a good place to share files,  jobmanager don't 
need create a local taskmanager-config.yaml at all, it could just pass the the 
base config file and some dynamic properties to override  the value in base 
config.



> Reduce dependency on HDFS at job startup time
> -
>
> Key: FLINK-5668
> URL: https://issues.apache.org/jira/browse/FLINK-5668
> Project: Flink
>  Issue Type: Improvement
>  Components: YARN
>Reporter: Bill Liu
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> When create a Flink cluster on Yarn,  JobManager depends on  HDFS to share  
> taskmanager-conf.yaml  with TaskManager.
> It's better to share the taskmanager-conf.yaml  on JobManager Web server 
> instead of HDFS, which could reduce the HDFS dependency  at job startup.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (FLINK-5668) Reduce dependency on HDFS at job startup time

2017-02-01 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15849048#comment-15849048
 ] 

Haohui Mai commented on FLINK-5668:
---

Please allow me to fill in some of the contexts here.

The request is to have Flink support alternative filesystems (e.g., S3) in 
Flink-on-YARN so that our mission critical jobs can survive unavailability of 
HDFS. Flink-on-YARN still depends on the underlying distributed file systems to 
implement high availability and reliability requirements. This jira has no 
intentions of changing the current mechanisms in Flink.

You are right on that YARN itself depends on a distributed file system to 
function correctly. It works well with HDFS, but in general it also works with 
any filesystems that implement the `FileSystem` API in Hadoop. There are 
multiple deployments in production that run YARN on S3.

Essentially we would like to take the approach of FLINK-5631 in a more 
comprehensive way -- in many places the Flink-on-YARN implementation simply 
takes the default file system from YARN. In fact the {{Path}} objects specify 
the filesystem, it would be great to teach Flink to recognize the {{Path}} 
objects properly just as what FLINK-5631 has done, so that it becomes possible 
to run Flink-on-YARN on alternative filesystems such as S3.

Does it make sense to you [~StephanEwen]?

> Reduce dependency on HDFS at job startup time
> -
>
> Key: FLINK-5668
> URL: https://issues.apache.org/jira/browse/FLINK-5668
> Project: Flink
>  Issue Type: Improvement
>  Components: YARN
>Reporter: Bill Liu
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> When create a Flink cluster on Yarn,  JobManager depends on  HDFS to share  
> taskmanager-conf.yaml  with TaskManager.
> It's better to share the taskmanager-conf.yaml  on JobManager Web server 
> instead of HDFS, which could reduce the HDFS dependency  at job startup.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)