Re: [Proposal] Extension of the Apex configuration to add dependent jar files in runtime.
IMO, support for Kubernetes, Docker images, Mesos and anything outside of Yarn deployments is a topic by itself and design for such support needs to be discussed. I do not want to propose any specific design, but assume that logic to create proper execution environment would be coded into Apex client. Whether it (hardcoded logic to create an execution environment) can be expressed simply as a list of dependent classes or jars is at minimum questionable. Until design is proposed and agreed upon, I'd prefer to use plugins for the subject. Thank you, Vlad On 2/2/18 13:17, Sanjay Pujare wrote: In cases where we have an "über" docker image containing support for multiple execution environments it might be useful for the Apex core to infer what kind of execution environment to use for a particular invocation (say based on configuration values/environment variables) and in that case the core will load the corresponding libraries. And I think this kind of flexibility or support would be difficult through the plugins hence I think Sergey's proposal will be useful. Sanjay On Fri, Feb 2, 2018 at 11:18 AM, Sergey Golovkowrote: Unfortunately the moving of .apa file to a docker image cannot resolve all problems with the dependencies. If we assume an Apex application should be run in different execution environments, the application docker image must contain all possible execution environment dependencies. I think the better way is to assume that the original application docker image like the current .apa file should contain the application specific dependencies only. And some smart client tool should create the executable application docker image form the original one and include the execution specific environment dependencies into the target application docker image. It means anyway an smart client Apex tool should have an interface to define different environment dependencies or combination of different dimensions of the environment dependencies. Thanks, Sergey On Fri, Feb 2, 2018 at 10:23 AM, Thomas Weise wrote: The current dependencies are based on how Apex YARN client works. YARN depends on a DFS implementation for deployment (not necessarily HDFS). I think a better way to look at this is to consider that instead of an .apa file the application is a docker image, which would contain Apex and all dependencies that the "StramClient" today adds for YARN. In that world there would be no Apex CLI or Apex specific client. Thomas On Thu, Feb 1, 2018 at 5:57 PM, Sergey Golovko wrote: I agree. It can be implemented with usage of plugins. But if I need to enable and configurate the plugin I need to put this information into dt-site.xml. It means The plugin and its parameter must be documented and the list of the added specific jars will be visible and available for updates to the end-user. The implementation via plugins is more dynamic solution that is more convenient for the application developers. But I'm talking about the static configuration of the Apex build or installation that relates more to the platform development. The current Apex core implementation uses the static unchanged list of jars for long time, because the Apex implementation still contains several basic static assumptions (for instance, the usage of YARN, HDSF, etc.). And the current Apex assumptions are hardcoded in the implementation. But if we are going to improve Apex and use Java interfaces in generic Apex implementation, the current static approach in Apex code to hardcode a list of dependent jars will not work anymore. It will require to include a new solution to add/change jars in specific Apex builds/configurations. And I don't think the usage of the plugins will be good for that. Thanks, Sergey On Thu, Feb 1, 2018 at 1:47 PM, Vlad Rozov wrote: There is a way to get the same end result by using plugins. It will be good to understand why plugin can't be used and can they be extended to provide the required functionality. Thank you, Vlad On 1/29/18 15:14, Sergey Golovko wrote: Hello All, In Apex there are two ways to deploy non-Hadoop jars to the deployed cluster. The first approach is static (hardcoded) and it is used by Apex platform developers only. There are several final static arrays of Java classes in StramClient.java that define which of the available jars should be included into deployment for every Apex application. The second approach is to add paths of all dependent jar-files to the value of the attribute LIB_JARS. The end-user can set/update the value of the attribute LIB_JARS via dt-site.xml files, command line parameters, application properties and plugins. The usage of the attribute LIB_JARS is the official documented way for all Apex users to manage by the deployment jars. But some of the dependent jars (not from the Apex core) can be common for all
Re: [Proposal] Extension of the Apex configuration to add dependent jar files in runtime.
In cases where we have an "über" docker image containing support for multiple execution environments it might be useful for the Apex core to infer what kind of execution environment to use for a particular invocation (say based on configuration values/environment variables) and in that case the core will load the corresponding libraries. And I think this kind of flexibility or support would be difficult through the plugins hence I think Sergey's proposal will be useful. Sanjay On Fri, Feb 2, 2018 at 11:18 AM, Sergey Golovkowrote: > Unfortunately the moving of .apa file to a docker image cannot resolve all > problems with the dependencies. If we assume an Apex application should be > run in different execution environments, the application docker image must > contain all possible execution environment dependencies. > > I think the better way is to assume that the original application docker > image like the current .apa file should contain the application specific > dependencies only. And some smart client tool should create the executable > application docker image form the original one and include the execution > specific environment dependencies into the target application docker image. > It means anyway an smart client Apex tool should have an interface to > define different environment dependencies or combination of different > dimensions of the environment dependencies. > > Thanks, > Sergey > > > On Fri, Feb 2, 2018 at 10:23 AM, Thomas Weise wrote: > > > The current dependencies are based on how Apex YARN client works. YARN > > depends on a DFS implementation for deployment (not necessarily HDFS). > > > > I think a better way to look at this is to consider that instead of an > .apa > > file the application is a docker image, which would contain Apex and all > > dependencies that the "StramClient" today adds for YARN. > > > > In that world there would be no Apex CLI or Apex specific client. > > > > Thomas > > > > > > > > On Thu, Feb 1, 2018 at 5:57 PM, Sergey Golovko > > wrote: > > > > > I agree. It can be implemented with usage of plugins. But if I need to > > > enable and configurate the plugin I need to put this information into > > > dt-site.xml. It means The plugin and its parameter must be documented > and > > > the list of the added specific jars will be visible and available for > > > updates to the end-user. The implementation via plugins is more dynamic > > > solution that is more convenient for the application developers. But > I'm > > > talking about the static configuration of the Apex build or > installation > > > that relates more to the platform development. > > > > > > The current Apex core implementation uses the static unchanged list of > > jars > > > for long time, because the Apex implementation still contains several > > basic > > > static assumptions (for instance, the usage of YARN, HDSF, etc.). And > the > > > current Apex assumptions are hardcoded in the implementation. But if we > > are > > > going to improve Apex and use Java interfaces in generic Apex > > > implementation, the current static approach in Apex code to hardcode a > > list > > > of dependent jars will not work anymore. It will require to include a > new > > > solution to add/change jars in specific Apex builds/configurations. > And I > > > don't think the usage of the plugins will be good for that. > > > > > > Thanks, > > > Sergey > > > > > > > > > On Thu, Feb 1, 2018 at 1:47 PM, Vlad Rozov wrote: > > > > > > > There is a way to get the same end result by using plugins. It will > be > > > > good to understand why plugin can't be used and can they be extended > to > > > > provide the required functionality. > > > > > > > > Thank you, > > > > > > > > Vlad > > > > > > > > > > > > On 1/29/18 15:14, Sergey Golovko wrote: > > > > > > > >> Hello All, > > > >> > > > >> In Apex there are two ways to deploy non-Hadoop jars to the deployed > > > >> cluster. > > > >> > > > >> The first approach is static (hardcoded) and it is used by Apex > > platform > > > >> developers only. There are several final static arrays of Java > classes > > > >> in StramClient.java > > > >> that define which of the available jars should be included into > > > deployment > > > >> for every Apex application. > > > >> > > > >> The second approach is to add paths of all dependent jar-files to > the > > > >> value > > > >> of the attribute LIB_JARS. The end-user can set/update the value of > > the > > > >> attribute LIB_JARS via dt-site.xml files, command line parameters, > > > >> application properties and plugins. The usage of the > > > >> attribute LIB_JARS is the official documented way for all Apex users > > to > > > >> manage by the deployment jars. > > > >> > > > >> But some of the dependent jars (not from the Apex core) can be > common > > > for > > > >> all customer's applications for a specific installation and/or > > execution > > > >>
Re: [Proposal] Extension of the Apex configuration to add dependent jar files in runtime.
Unfortunately the moving of .apa file to a docker image cannot resolve all problems with the dependencies. If we assume an Apex application should be run in different execution environments, the application docker image must contain all possible execution environment dependencies. I think the better way is to assume that the original application docker image like the current .apa file should contain the application specific dependencies only. And some smart client tool should create the executable application docker image form the original one and include the execution specific environment dependencies into the target application docker image. It means anyway an smart client Apex tool should have an interface to define different environment dependencies or combination of different dimensions of the environment dependencies. Thanks, Sergey On Fri, Feb 2, 2018 at 10:23 AM, Thomas Weisewrote: > The current dependencies are based on how Apex YARN client works. YARN > depends on a DFS implementation for deployment (not necessarily HDFS). > > I think a better way to look at this is to consider that instead of an .apa > file the application is a docker image, which would contain Apex and all > dependencies that the "StramClient" today adds for YARN. > > In that world there would be no Apex CLI or Apex specific client. > > Thomas > > > > On Thu, Feb 1, 2018 at 5:57 PM, Sergey Golovko > wrote: > > > I agree. It can be implemented with usage of plugins. But if I need to > > enable and configurate the plugin I need to put this information into > > dt-site.xml. It means The plugin and its parameter must be documented and > > the list of the added specific jars will be visible and available for > > updates to the end-user. The implementation via plugins is more dynamic > > solution that is more convenient for the application developers. But I'm > > talking about the static configuration of the Apex build or installation > > that relates more to the platform development. > > > > The current Apex core implementation uses the static unchanged list of > jars > > for long time, because the Apex implementation still contains several > basic > > static assumptions (for instance, the usage of YARN, HDSF, etc.). And the > > current Apex assumptions are hardcoded in the implementation. But if we > are > > going to improve Apex and use Java interfaces in generic Apex > > implementation, the current static approach in Apex code to hardcode a > list > > of dependent jars will not work anymore. It will require to include a new > > solution to add/change jars in specific Apex builds/configurations. And I > > don't think the usage of the plugins will be good for that. > > > > Thanks, > > Sergey > > > > > > On Thu, Feb 1, 2018 at 1:47 PM, Vlad Rozov wrote: > > > > > There is a way to get the same end result by using plugins. It will be > > > good to understand why plugin can't be used and can they be extended to > > > provide the required functionality. > > > > > > Thank you, > > > > > > Vlad > > > > > > > > > On 1/29/18 15:14, Sergey Golovko wrote: > > > > > >> Hello All, > > >> > > >> In Apex there are two ways to deploy non-Hadoop jars to the deployed > > >> cluster. > > >> > > >> The first approach is static (hardcoded) and it is used by Apex > platform > > >> developers only. There are several final static arrays of Java classes > > >> in StramClient.java > > >> that define which of the available jars should be included into > > deployment > > >> for every Apex application. > > >> > > >> The second approach is to add paths of all dependent jar-files to the > > >> value > > >> of the attribute LIB_JARS. The end-user can set/update the value of > the > > >> attribute LIB_JARS via dt-site.xml files, command line parameters, > > >> application properties and plugins. The usage of the > > >> attribute LIB_JARS is the official documented way for all Apex users > to > > >> manage by the deployment jars. > > >> > > >> But some of the dependent jars (not from the Apex core) can be common > > for > > >> all customer's applications for a specific installation and/or > execution > > >> environment. Unfortunately the Apex implementation does not contain > the > > >> middle solution that would allow the Apex developers and customer > > support > > >> to > > >> define and add new dependent jar-files (jars that should not be > > >> configurable/managed by the end-user) without the > updates/recompilation > > of > > >> the Apex Java code during the Apex building process and/or > > >> installation/configuration. > > >> > > >> Also the having of such kind of flexibility would allow the Apex core > > >> developers to use Java interfaces during the development to define an > > >> abstraction layer in Apex implementation and configurate Apex core to > > add > > >> some specific jars to all Apex applications without recompilation of > the > > >> Apex source code. > > >> > > >> For
Re: [Proposal] Extension of the Apex configuration to add dependent jar files in runtime.
The current dependencies are based on how Apex YARN client works. YARN depends on a DFS implementation for deployment (not necessarily HDFS). I think a better way to look at this is to consider that instead of an .apa file the application is a docker image, which would contain Apex and all dependencies that the "StramClient" today adds for YARN. In that world there would be no Apex CLI or Apex specific client. Thomas On Thu, Feb 1, 2018 at 5:57 PM, Sergey Golovkowrote: > I agree. It can be implemented with usage of plugins. But if I need to > enable and configurate the plugin I need to put this information into > dt-site.xml. It means The plugin and its parameter must be documented and > the list of the added specific jars will be visible and available for > updates to the end-user. The implementation via plugins is more dynamic > solution that is more convenient for the application developers. But I'm > talking about the static configuration of the Apex build or installation > that relates more to the platform development. > > The current Apex core implementation uses the static unchanged list of jars > for long time, because the Apex implementation still contains several basic > static assumptions (for instance, the usage of YARN, HDSF, etc.). And the > current Apex assumptions are hardcoded in the implementation. But if we are > going to improve Apex and use Java interfaces in generic Apex > implementation, the current static approach in Apex code to hardcode a list > of dependent jars will not work anymore. It will require to include a new > solution to add/change jars in specific Apex builds/configurations. And I > don't think the usage of the plugins will be good for that. > > Thanks, > Sergey > > > On Thu, Feb 1, 2018 at 1:47 PM, Vlad Rozov wrote: > > > There is a way to get the same end result by using plugins. It will be > > good to understand why plugin can't be used and can they be extended to > > provide the required functionality. > > > > Thank you, > > > > Vlad > > > > > > On 1/29/18 15:14, Sergey Golovko wrote: > > > >> Hello All, > >> > >> In Apex there are two ways to deploy non-Hadoop jars to the deployed > >> cluster. > >> > >> The first approach is static (hardcoded) and it is used by Apex platform > >> developers only. There are several final static arrays of Java classes > >> in StramClient.java > >> that define which of the available jars should be included into > deployment > >> for every Apex application. > >> > >> The second approach is to add paths of all dependent jar-files to the > >> value > >> of the attribute LIB_JARS. The end-user can set/update the value of the > >> attribute LIB_JARS via dt-site.xml files, command line parameters, > >> application properties and plugins. The usage of the > >> attribute LIB_JARS is the official documented way for all Apex users to > >> manage by the deployment jars. > >> > >> But some of the dependent jars (not from the Apex core) can be common > for > >> all customer's applications for a specific installation and/or execution > >> environment. Unfortunately the Apex implementation does not contain the > >> middle solution that would allow the Apex developers and customer > support > >> to > >> define and add new dependent jar-files (jars that should not be > >> configurable/managed by the end-user) without the updates/recompilation > of > >> the Apex Java code during the Apex building process and/or > >> installation/configuration. > >> > >> Also the having of such kind of flexibility would allow the Apex core > >> developers to use Java interfaces during the development to define an > >> abstraction layer in Apex implementation and configurate Apex core to > add > >> some specific jars to all Apex applications without recompilation of the > >> Apex source code. > >> > >> For instance, now the usage of HDFS is hardcoded in Apex platform code > but > >> it can be replaced with any other distributed or cloud base file system. > >> The Apex core code can use an interface for all I/O operations but the > >> supporting of a real specific file system implementation can be added as > >> an > >> independent jar-file. Or if the implementation of some of Apex operators > >> depend on a specific service, and it is necessary to add some of the > >> service jars to every Apex application implicitly. > >> > >> The proposal: > >> > >> - add a predefined configuration text file (we can make any choice for > the > >> file syntax: XML, JSON or Properties) to Apex engine resources with > >> predefined values of some of the Apex attributes (now we can include > >> LIB_JARS > >> attribute only); > >> - allow to have a configuration text file with the same functionality in > >> the Apex installation folder "conf"; > >> - read the content of the predefined configuration text files by the > stram > >> client in runtime and add the jars to the list of the dependent jars; > >> - allow to use paths to jars and