[GitHub] shanthoosh commented on a change in pull request #920: SAMZA-2106: Samza app and job config refactor
shanthoosh commented on a change in pull request #920: SAMZA-2106: Samza app and job config refactor URL: https://github.com/apache/samza/pull/920#discussion_r258307442 ## File path: samza-core/src/main/java/org/apache/samza/execution/JobPlanner.java ## @@ -155,4 +158,33 @@ final void writePlanJsonFile(String planJson) { systemStreamConfigs.put(JobConfig.JOB_DEFAULT_SYSTEM(), dsd.getSystemName())); return systemStreamConfigs; } + + /** + * Generates job.id from app.id and job.name from app.name config + * If both job.id and app.id is defined, app.id takes precedence and job.id is set to value of app.id + * If both job.name and app.name is defined, app.name takes precedence and job.name is set to value of app.name + * + * @param allowedUserConfig configs passed from user + */ + void generateJobIdAndName(Map allowedUserConfig) { +if (allowedUserConfig.containsKey(JobConfig.JOB_ID())) { + LOG.warn("{} is a deprecated configuration, use app.id instead.", JobConfig.JOB_ID()); +} + +if (allowedUserConfig.containsKey(JobConfig.JOB_NAME())) { + LOG.warn("{} is a deprecated configuration, use use app.name instead.", JobConfig.JOB_NAME()); +} + +if (allowedUserConfig.containsKey(ApplicationConfig.APP_NAME)) { Review comment: It is a good practice not to mutate the arguments(`allowedUserConfig`) as a part of the method implementation. It will improve readability and be simpler when someone has to re-read again in the future. Rather than having return type void in this implementation, may be you can return the updated configuration, like ``` MapConfig calculateJobIdAndJobName(String jobName, String jobId) ``` At caller, you can simply add the return value you received from this method invocation to the final configuration bag. Please change if this makes sense. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] shanthoosh commented on a change in pull request #920: SAMZA-2106: Samza app and job config refactor
shanthoosh commented on a change in pull request #920: SAMZA-2106: Samza app and job config refactor URL: https://github.com/apache/samza/pull/920#discussion_r258306761 ## File path: samza-core/src/main/scala/org/apache/samza/coordinator/stream/CoordinatorStreamWriterCommandLine.scala ## @@ -68,4 +72,27 @@ class CoordinatorStreamWriterCommandLine extends CommandLine { value } + + def genJobConfigs(config: MapConfig): MapConfig = { Review comment: Rather than duplicating this code at two places, could you please extract them into a util method and move the duplicated code there. Invoke the util method at all the places where you're calling this method. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] shanthoosh commented on a change in pull request #920: SAMZA-2106: Samza app and job config refactor
shanthoosh commented on a change in pull request #920: SAMZA-2106: Samza app and job config refactor URL: https://github.com/apache/samza/pull/920#discussion_r258308009 ## File path: samza-core/src/test/java/org/apache/samza/runtime/TestLocalApplicationRunner.java ## @@ -75,8 +75,8 @@ public void setUp() { public void testRunStreamTask() { final Map cfgs = new HashMap<>(); cfgs.put(ApplicationConfig.APP_PROCESSOR_ID_GENERATOR_CLASS, UUIDGenerator.class.getName()); -cfgs.put(JobConfig.JOB_NAME(), "test-task-job"); -cfgs.put(JobConfig.JOB_ID(), "jobId"); +cfgs.put(ApplicationConfig.APP_NAME, "test-task-job"); +cfgs.put(ApplicationConfig.APP_ID, "jobId"); Review comment: s/jobId/testAppId s/test-task-job/testAppName This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] shanthoosh commented on a change in pull request #920: SAMZA-2106: Samza app and job config refactor
shanthoosh commented on a change in pull request #920: SAMZA-2106: Samza app and job config refactor URL: https://github.com/apache/samza/pull/920#discussion_r258307896 ## File path: samza-test/src/main/java/org/apache/samza/test/framework/TestRunner.java ## @@ -104,7 +105,7 @@ private TestRunner() { this.configs = new HashMap<>(); this.inMemoryScope = RandomStringUtils.random(10, true, true); -configs.put(JobConfig.JOB_NAME(), JOB_NAME); +configs.put(ApplicationConfig.APP_NAME, JOB_NAME); Review comment: Rename the `JOB_NAME` constant to `APP_NAME`. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] shanthoosh commented on a change in pull request #920: SAMZA-2106: Samza app and job config refactor
shanthoosh commented on a change in pull request #920: SAMZA-2106: Samza app and job config refactor URL: https://github.com/apache/samza/pull/920#discussion_r258141427 ## File path: samza-core/src/main/java/org/apache/samza/execution/JobPlanner.java ## @@ -155,4 +158,33 @@ final void writePlanJsonFile(String planJson) { systemStreamConfigs.put(JobConfig.JOB_DEFAULT_SYSTEM(), dsd.getSystemName())); return systemStreamConfigs; } + + /** + * Generates job.id from app.id and job.name from app.name config + * If both job.id and app.id is defined, app.id takes precedence and job.id is set to value of app.id + * If both job.name and app.name is defined, app.name takes precedence and job.name is set to value of app.name + * + * @param allowedUserConfig configs passed from user + */ + void generateJobIdAndName(Map allowedUserConfig) { +if (allowedUserConfig.containsKey(JobConfig.JOB_ID())) { + LOG.warn("{} is a deprecated configuration, use app.id instead.", JobConfig.JOB_ID()); Review comment: 1. Can you please deprecate the job.id, job.name configurations from the samza configuration table. 2. There're some samples in [samza-hello-samza](https://github.com/apache/samza-hello-samza/blob/master/src/main/config/azure-application-local-runner.properties) in which job.id, job.name are defined as configuration. Could you please update them to use app.id, app.name. Could you check if the same point applies to the samza-hello-samza samples within linkedin? 3. Could you please update the integration test files located in samza-test/src/test/ to use the new app.id, app.name configuration. [Example](https://github.com/apache/samza/blob/master/samza-test/src/main/config/join/emitter.samza) 4. From what I see, we're doing configuration aliasing in the `LocalApplicationRunner`, `JobPlanner` directly. Could you share what is the rationale behind this decision? Why not do this in a `Rewriter`? 5. Consider a case where user has defined only `app.id`, `app.name` in the configuration and launches their low-level application. Could you please share how will submitting `YarnJob` through `JobRunner` work? `run-job.sh` launches `YarnJob` which requires the `job.name`, `job.id` to built through `CoordinatorStreamUtil.buildCoordinatorStreamConfig` ? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services