[
https://issues.apache.org/jira/browse/SUBMARINE-47?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16811491#comment-16811491
]
Szilard Nemeth edited comment on SUBMARINE-47 at 4/6/19 7:27 AM:
-----------------------------------------------------------------
Hi [~sunilg]!
I meant the jenkins failure above my comment. There was a protobuf version
issue, something unrelated to my patch it seems. Should we retrigger in this
case?
Thanks!
was (Author: snemeth):
Hi [~sunilg]!
I mrant the jenkins failure above my comment. There was a protobuf version
issue, something unrelated to my patch it seems. Should we retrigger in this
case?
Thanks!
> Provide an implementation to parse configuration values from a YAML file for
> submarine run CLI
> ----------------------------------------------------------------------------------------------
>
> Key: SUBMARINE-47
> URL: https://issues.apache.org/jira/browse/SUBMARINE-47
> Project: Hadoop Submarine
> Issue Type: New Feature
> Reporter: Szilard Nemeth
> Assignee: Szilard Nemeth
> Priority: Major
> Fix For: 0.2.0
>
> Attachments: SUBMARINE-47.001.patch, SUBMARINE-47.002.patch,
> SUBMARINE-47.002.patch, SUBMARINE-47.003.patch, SUBMARINE-47.003.patch,
> SUBMARINE-47.004.patch, SUBMARINE-47.005.patch, SUBMARINE-47.006.patch,
> SUBMARINE-47.007.patch, SUBMARINE-47.008.patch, valid-config.yaml
>
>
> In the Submarine design doc
> ([https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit#heading=h.klmv5hrbgx9l]),
> under section "Job commands" (submarine job run), the document talks about
> the need to implement a YAML parser as an alternative configuration source of
> Submarine.
> This jira provides an implementation of this YAML configuration parser.
> *Some details on the implementation:*
> *Data model and YAML library:*
> SnakeYAML was the library of choice, as we already include it in other
> Hadoop maven modules.
> SnakeYAML requires data classes to parse the YAML structure into Java
> classes, this is very similar to parsing JSON data.
> The data classes are under this package:
> org.apache.hadoop.yarn.submarine.client.cli.param.yaml
> The data model is the following:
> 1. On the root level, we have the class YamlConfigFile. This class has
> fields for configs, roles, scheduling, etc.
> 2. Class Configs is a class that stores various config values.
> 3. We have the class Roles, that stores the role configurations, currently
> we only have Worker and Ps (ParameterServer).
> 4. There's a base class for Roles, called Role. For some reasons, SnakeYAML
> did not like this class as an abstract class, but it's not that important to
> have it like that, so I stopped caring.
> There are several other data classes, please note that all of these classes
> are trying to "mirror" the example YAML file from the design doc, so you can
> check that for more details.
> *Testing:*
> Added a new class called TestRunJobCliParsingYaml that exclusively verifies
> the correctness of YAML parsing.
> I tried to define the names of the testcases as descriptive and easy to
> understand as possible.
> Covering some edge cases was also a quite important factor, like checking
> how parsing handles non-existing files, empty files, etc.
> *ParameterHolder:*
> In order to minimize changes in RunJobParameters, I introduced a
> ParameterHolder class that is passed
> RunJobParameters#updateParametersByParsedCommandline instead of the parsed
> CLI object we had before. This way, I could use the ParameterHolder to store
> the parsed CLI values along with parsed YAML values, so it's a single source
> of configuration. This class can act as a "decision point" about future needs
> on value precedences: For example, if we want to have precedence for some
> configs coming from the CLI over the same configs coming from the YAML file
> (or vica-versa), it is easily doable with the current implementation.
> Currently, all values coming from the YAML config are in precedence.
> *What CLI options are covered by YAML:*
> Every CLI option have it's YAML counterpart, so the coverage is 100%.
> See this method for all the available options: RunJobCli#generateOptions.
> However, the design doc contains 2 fields (called reservation and placement)
> under the 'scheduling' section that are not added to the YAML data classes.
> The reason for this is that I haven't found any implementation for these
> properties with the original CLI parser, either.
> *Questions:*
> 1. Do we want to allow mixing of CLI configs and YAML configs? We could
> either fail-fast if a config is defined twice. With my current
> implementation, the values coming from YAML always have precedence over CLI
> values, so defining a config twice is not an error case yet.
> 2. What types of roles should we allow? Is there any additional apart from
> Worker and Ps?
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)