[GitHub] [incubator-gobblin] yukuai518 commented on issue #2648: [GOBBLIN-784] Allow setting replication factor in distcp
yukuai518 commented on issue #2648: [GOBBLIN-784] Allow setting replication factor in distcp URL: https://github.com/apache/incubator-gobblin/pull/2648#issuecomment-495805911 LGTM This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Work logged] (GOBBLIN-784) Allow setting replication factor in distcp
[ https://issues.apache.org/jira/browse/GOBBLIN-784?focusedWorklogId=248339=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-248339 ] ASF GitHub Bot logged work on GOBBLIN-784: -- Author: ASF GitHub Bot Created on: 24/May/19 22:34 Start Date: 24/May/19 22:34 Worklog Time Spent: 10m Work Description: jack-moseley commented on pull request #2648: [GOBBLIN-784] Allow setting replication factor in distcp URL: https://github.com/apache/incubator-gobblin/pull/2648 Dear Gobblin maintainers, Please accept this PR. I understand that it will not be reviewed until I have checked off all the steps below! ### JIRA - [x] My PR addresses the following [Gobblin JIRA](https://issues.apache.org/jira/browse/GOBBLIN/) issues and references them in the PR title. For example, "[GOBBLIN-XXX] My Gobblin PR" - https://issues.apache.org/jira/browse/GOBBLIN-784 ### Description - [x] Here are some details about my PR, including screenshots (if applicable): If `writer.file.replication.factor`, override the default behavior of distcp (which is use the source factor or use the filesystem default). ### Tests - [x] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: Trivial change and tested locally ### Commits - [x] My commits all reference JIRA issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 2. Subject is limited to 50 characters 3. Subject does not end with a period 4. Subject uses the imperative mood ("add", not "adding") 5. Body wraps at 72 characters 6. Body explains "what" and "why", not "how" This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 248339) Time Spent: 10m Remaining Estimate: 0h > Allow setting replication factor in distcp > -- > > Key: GOBBLIN-784 > URL: https://issues.apache.org/jira/browse/GOBBLIN-784 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Jack Moseley >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [incubator-gobblin] jack-moseley opened a new pull request #2648: [GOBBLIN-784] Allow setting replication factor in distcp
jack-moseley opened a new pull request #2648: [GOBBLIN-784] Allow setting replication factor in distcp URL: https://github.com/apache/incubator-gobblin/pull/2648 Dear Gobblin maintainers, Please accept this PR. I understand that it will not be reviewed until I have checked off all the steps below! ### JIRA - [x] My PR addresses the following [Gobblin JIRA](https://issues.apache.org/jira/browse/GOBBLIN/) issues and references them in the PR title. For example, "[GOBBLIN-XXX] My Gobblin PR" - https://issues.apache.org/jira/browse/GOBBLIN-784 ### Description - [x] Here are some details about my PR, including screenshots (if applicable): If `writer.file.replication.factor`, override the default behavior of distcp (which is use the source factor or use the filesystem default). ### Tests - [x] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: Trivial change and tested locally ### Commits - [x] My commits all reference JIRA issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 2. Subject is limited to 50 characters 3. Subject does not end with a period 4. Subject uses the imperative mood ("add", not "adding") 5. Body wraps at 72 characters 6. Body explains "what" and "why", not "how" This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Created] (GOBBLIN-784) Allow setting replication factor in distcp
Jack Moseley created GOBBLIN-784: Summary: Allow setting replication factor in distcp Key: GOBBLIN-784 URL: https://issues.apache.org/jira/browse/GOBBLIN-784 Project: Apache Gobblin Issue Type: Improvement Reporter: Jack Moseley -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (GOBBLIN-781) Clean Flow execution state when DR is enabled: Skeleton
[ https://issues.apache.org/jira/browse/GOBBLIN-781?focusedWorklogId=248286=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-248286 ] ASF GitHub Bot logged work on GOBBLIN-781: -- Author: ASF GitHub Bot Created on: 24/May/19 21:06 Start Date: 24/May/19 21:06 Worklog Time Spent: 10m Work Description: autumnust commented on issue #2647: [GOBBLIN-781] Skeleton for GaaS DR mode clean transition URL: https://github.com/apache/incubator-gobblin/pull/2647#issuecomment-495787988 @sv2000 @arjun4084346 Can you take a look ? Thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 248286) Time Spent: 20m (was: 10m) > Clean Flow execution state when DR is enabled: Skeleton > --- > > Key: GOBBLIN-781 > URL: https://issues.apache.org/jira/browse/GOBBLIN-781 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Lei Sun >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [incubator-gobblin] autumnust opened a new pull request #2647: [GOBBLIN-781] Skeleton for GaaS DR mode clean transition
autumnust opened a new pull request #2647: [GOBBLIN-781] Skeleton for GaaS DR mode clean transition URL: https://github.com/apache/incubator-gobblin/pull/2647 Dear Gobblin maintainers, Please accept this PR. I understand that it will not be reviewed until I have checked off all the steps below! A little bit background on this PR: In Gobblin-as-a-Service deployment, there are extreme cases that the whole data center where a GaaS cluster is deployed went down. In that case we will launch another GaaS cluster to handle request-serving and orchestration. To guarantee a valid working state for DR-Nominated GaaS Cluster, we will need to clean certain left-over states. Also, not all `FlowSpec`s would be eligible for running in alternative environment due to data affinity or so. So we provide `SpecStore` an ability to tag certain `FlowSpec`s and only those will be picked up by DR-Nominated cluster to work on. ### JIRA - [x] My PR addresses the following [Gobblin JIRA] - https://issues.apache.org/jira/browse/GOBBLIN-781 ### Description - [x] Here are some details about my PR, including screenshots (if applicable): This PR is mainly solving problems mentioned in background sections by: - Adding "tag" into `MysqlSpecStore`. - Adding `isNominatedDRHanlder ` as a configuration for newly-launched GaaS master node. - Adding logic inside `scheduleSpecsFromCatalog` to pick up tagged `FlowSpec`s in DR-nominated node. This is just a skeleton code for now since the real cancellation features is not yet there. - Fixing some of API in `SpecStore` and deprecated the old method load all `Spec`s into memory. - Some changes in `GobblinServiceManager` to get rid of left-over during prototyping phase. ### Tests - [x] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: - Unit test for loading `spec`s with tag. ### Commits - [x] My commits all reference JIRA issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 2. Subject is limited to 50 characters 3. Subject does not end with a period 4. Subject uses the imperative mood ("add", not "adding") 5. Body wraps at 72 characters 6. Body explains "what" and "why", not "how" This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-gobblin] ibuenros commented on a change in pull request #2637: [GOBBLIN-772]Implement Schema Comparison Strategy during Disctp
ibuenros commented on a change in pull request #2637: [GOBBLIN-772]Implement Schema Comparison Strategy during Disctp URL: https://github.com/apache/incubator-gobblin/pull/2637#discussion_r287477265 ## File path: gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/extractor/FileAwareInputStreamExtractorWithCheckSchema.java ## @@ -39,32 +43,38 @@ * check if the schema matches the expected schema. If not it will abort the job. */ -public class FileAwareInputStreamExtractorWithCheckSchema extends FileAwareInputStreamExtractor{ +public class FileAwareInputStreamExtractorWithCheckSchema extends FileAwareInputStreamExtractor { Review comment: @jhsenjaliya the reason we did this in a different class is because it is leaking Avro details. CopySource and the associated extractor are able to copy arbitrary files simply as byte streams. This particular extension requires the copied files to be Avro. In general, I think that in a byte copier any functionality that assumes a schema associated with those bytes is pollution of the logic. What do you think? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Work logged] (GOBBLIN-783) Fix the double referencing issue for job type config
[ https://issues.apache.org/jira/browse/GOBBLIN-783?focusedWorklogId=248206=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-248206 ] ASF GitHub Bot logged work on GOBBLIN-783: -- Author: ASF GitHub Bot Created on: 24/May/19 18:30 Start Date: 24/May/19 18:30 Worklog Time Spent: 10m Work Description: yukuai518 commented on issue #2646: [GOBBLIN-783] Fix the double referencing issue for job type config. URL: https://github.com/apache/incubator-gobblin/pull/2646#issuecomment-495745282 @autumnust , please review. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 248206) Time Spent: 20m (was: 10m) > Fix the double referencing issue for job type config > > > Key: GOBBLIN-783 > URL: https://issues.apache.org/jira/browse/GOBBLIN-783 > Project: Apache Gobblin > Issue Type: Bug >Reporter: Kuai Yu >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GOBBLIN-783) Fix the double referencing issue for job type config
Kuai Yu created GOBBLIN-783: --- Summary: Fix the double referencing issue for job type config Key: GOBBLIN-783 URL: https://issues.apache.org/jira/browse/GOBBLIN-783 Project: Apache Gobblin Issue Type: Bug Reporter: Kuai Yu -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (GOBBLIN-783) Fix the double referencing issue for job type config
[ https://issues.apache.org/jira/browse/GOBBLIN-783?focusedWorklogId=248205=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-248205 ] ASF GitHub Bot logged work on GOBBLIN-783: -- Author: ASF GitHub Bot Created on: 24/May/19 18:30 Start Date: 24/May/19 18:30 Worklog Time Spent: 10m Work Description: yukuai518 commented on pull request #2646: [GOBBLIN-783] Fix the double referencing issue for job type config. URL: https://github.com/apache/incubator-gobblin/pull/2646 Dear Gobblin maintainers, Please accept this PR. I understand that it will not be reviewed until I have checked off all the steps below! ### JIRA - [x] My PR addresses the following [Gobblin JIRA](https://issues.apache.org/jira/browse/GOBBLIN/) issues and references them in the PR title. For example, "[GOBBLIN-XXX] My Gobblin PR" - https://issues.apache.org/jira/browse/GOBBLIN-783 ### Description - [x] Here are some details about my PR, including screenshots (if applicable): Fix the double referencing issue for job type config. ### Tests - [x] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: No test required. ### Commits - [x] My commits all reference JIRA issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 2. Subject is limited to 50 characters 3. Subject does not end with a period 4. Subject uses the imperative mood ("add", not "adding") 5. Body wraps at 72 characters 6. Body explains "what" and "why", not "how" This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 248205) Time Spent: 10m Remaining Estimate: 0h > Fix the double referencing issue for job type config > > > Key: GOBBLIN-783 > URL: https://issues.apache.org/jira/browse/GOBBLIN-783 > Project: Apache Gobblin > Issue Type: Bug >Reporter: Kuai Yu >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [incubator-gobblin] yukuai518 commented on issue #2646: [GOBBLIN-783] Fix the double referencing issue for job type config.
yukuai518 commented on issue #2646: [GOBBLIN-783] Fix the double referencing issue for job type config. URL: https://github.com/apache/incubator-gobblin/pull/2646#issuecomment-495745282 @autumnust , please review. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-gobblin] yukuai518 opened a new pull request #2646: [GOBBLIN-783] Fix the double referencing issue for job type config.
yukuai518 opened a new pull request #2646: [GOBBLIN-783] Fix the double referencing issue for job type config. URL: https://github.com/apache/incubator-gobblin/pull/2646 Dear Gobblin maintainers, Please accept this PR. I understand that it will not be reviewed until I have checked off all the steps below! ### JIRA - [x] My PR addresses the following [Gobblin JIRA](https://issues.apache.org/jira/browse/GOBBLIN/) issues and references them in the PR title. For example, "[GOBBLIN-XXX] My Gobblin PR" - https://issues.apache.org/jira/browse/GOBBLIN-783 ### Description - [x] Here are some details about my PR, including screenshots (if applicable): Fix the double referencing issue for job type config. ### Tests - [x] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: No test required. ### Commits - [x] My commits all reference JIRA issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 2. Subject is limited to 50 characters 3. Subject does not end with a period 4. Subject uses the imperative mood ("add", not "adding") 5. Body wraps at 72 characters 6. Body explains "what" and "why", not "how" This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-gobblin] jack-moseley opened a new pull request #2645: [GOBBLIN-782] Add config to inject properties to all orchestrated jobs
jack-moseley opened a new pull request #2645: [GOBBLIN-782] Add config to inject properties to all orchestrated jobs URL: https://github.com/apache/incubator-gobblin/pull/2645 Dear Gobblin maintainers, Please accept this PR. I understand that it will not be reviewed until I have checked off all the steps below! ### JIRA - [x] My PR addresses the following [Gobblin JIRA](https://issues.apache.org/jira/browse/GOBBLIN/) issues and references them in the PR title. For example, "[GOBBLIN-XXX] My Gobblin PR" - https://issues.apache.org/jira/browse/GOBBLIN-782 ### Description - [x] Here are some details about my PR, including screenshots (if applicable): Add an `injectProperty` config, where all configs `injectProperty.*` specified in system config will be injected to every job orchestrated by gobblin service. These are added as fallback properties, so they can be overridden by flow/job properties. ### Tests - [x] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: Trivial change and tested locally ### Commits - [x] My commits all reference JIRA issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 2. Subject is limited to 50 characters 3. Subject does not end with a period 4. Subject uses the imperative mood ("add", not "adding") 5. Body wraps at 72 characters 6. Body explains "what" and "why", not "how" This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Work logged] (GOBBLIN-772) Implement Schema Comparison Strategy during Disctp
[ https://issues.apache.org/jira/browse/GOBBLIN-772?focusedWorklogId=248144=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-248144 ] ASF GitHub Bot logged work on GOBBLIN-772: -- Author: ASF GitHub Bot Created on: 24/May/19 16:57 Start Date: 24/May/19 16:57 Worklog Time Spent: 10m Work Description: autumnust commented on pull request #2637: [GOBBLIN-772]Implement Schema Comparison Strategy during Disctp URL: https://github.com/apache/incubator-gobblin/pull/2637#discussion_r287440685 ## File path: gobblin-restli/gobblin-throttling-service/gobblin-throttling-service-api/src/main/snapshot/org.apache.gobblin.restli.throttling.permits.snapshot.json ## @@ -17,6 +17,18 @@ "type" : "long", "doc" : "Client should not try to acquire permits before this delay has passed.", "optional" : true +}, { Review comment: Got the point. Just mention in the description that why this changes (which is unrelated to the PR itself) shows up as part of code change. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 248144) Time Spent: 2h 10m (was: 2h) > Implement Schema Comparison Strategy during Disctp > -- > > Key: GOBBLIN-772 > URL: https://issues.apache.org/jira/browse/GOBBLIN-772 > Project: Apache Gobblin > Issue Type: Task >Reporter: Zihan Li >Priority: Major > Time Spent: 2h 10m > Remaining Estimate: 0h > > We need a schema comparison strategy to make sure the real schema and the > expected schema have matching field names and types. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (GOBBLIN-772) Implement Schema Comparison Strategy during Disctp
[ https://issues.apache.org/jira/browse/GOBBLIN-772?focusedWorklogId=248143=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-248143 ] ASF GitHub Bot logged work on GOBBLIN-772: -- Author: ASF GitHub Bot Created on: 24/May/19 16:57 Start Date: 24/May/19 16:57 Worklog Time Spent: 10m Work Description: autumnust commented on pull request #2637: [GOBBLIN-772]Implement Schema Comparison Strategy during Disctp URL: https://github.com/apache/incubator-gobblin/pull/2637#discussion_r287436534 ## File path: gobblin-data-management/src/main/java/org/apache/gobblin/util/schema_check/AvroSchemaCheckStrategy.java ## @@ -0,0 +1,55 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.gobblin.util.schema_check; + +import org.apache.avro.Schema; +import org.apache.gobblin.configuration.ConfigurationKeys; +import org.apache.gobblin.configuration.WorkUnitState; + + +/** + * The strategy to compare Avro schema. + */ +public interface AvroSchemaCheckStrategy { + /** + * A factory to initiate the Strategy + */ + class AvroSchemaCheckStrategyFactory { +/** + * Use the configuration to create a schema check strategy. If it's not found, return null. + * @param state + * @return + */ +public static AvroSchemaCheckStrategy create(WorkUnitState state) +{ + try { +return (AvroSchemaCheckStrategy) Class.forName(state.getProp(ConfigurationKeys.AVRO_SCHEMA_CHECK_STRATEGY, ConfigurationKeys.AVRO_SCHEMA_CHECK_STRATEGY_DEFAULT)).newInstance(); + } catch (Exception e) + { Review comment: nitpick: usually we would have `{` in the same line of `catch` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 248143) > Implement Schema Comparison Strategy during Disctp > -- > > Key: GOBBLIN-772 > URL: https://issues.apache.org/jira/browse/GOBBLIN-772 > Project: Apache Gobblin > Issue Type: Task >Reporter: Zihan Li >Priority: Major > Time Spent: 2h > Remaining Estimate: 0h > > We need a schema comparison strategy to make sure the real schema and the > expected schema have matching field names and types. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [incubator-gobblin] autumnust commented on a change in pull request #2637: [GOBBLIN-772]Implement Schema Comparison Strategy during Disctp
autumnust commented on a change in pull request #2637: [GOBBLIN-772]Implement Schema Comparison Strategy during Disctp URL: https://github.com/apache/incubator-gobblin/pull/2637#discussion_r287436534 ## File path: gobblin-data-management/src/main/java/org/apache/gobblin/util/schema_check/AvroSchemaCheckStrategy.java ## @@ -0,0 +1,55 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.gobblin.util.schema_check; + +import org.apache.avro.Schema; +import org.apache.gobblin.configuration.ConfigurationKeys; +import org.apache.gobblin.configuration.WorkUnitState; + + +/** + * The strategy to compare Avro schema. + */ +public interface AvroSchemaCheckStrategy { + /** + * A factory to initiate the Strategy + */ + class AvroSchemaCheckStrategyFactory { +/** + * Use the configuration to create a schema check strategy. If it's not found, return null. + * @param state + * @return + */ +public static AvroSchemaCheckStrategy create(WorkUnitState state) +{ + try { +return (AvroSchemaCheckStrategy) Class.forName(state.getProp(ConfigurationKeys.AVRO_SCHEMA_CHECK_STRATEGY, ConfigurationKeys.AVRO_SCHEMA_CHECK_STRATEGY_DEFAULT)).newInstance(); + } catch (Exception e) + { Review comment: nitpick: usually we would have `{` in the same line of `catch` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-gobblin] autumnust commented on a change in pull request #2637: [GOBBLIN-772]Implement Schema Comparison Strategy during Disctp
autumnust commented on a change in pull request #2637: [GOBBLIN-772]Implement Schema Comparison Strategy during Disctp URL: https://github.com/apache/incubator-gobblin/pull/2637#discussion_r287437052 ## File path: gobblin-data-management/src/main/java/org/apache/gobblin/util/schema_check/AvroSchemaCheckStrategy.java ## @@ -0,0 +1,55 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.gobblin.util.schema_check; + +import org.apache.avro.Schema; +import org.apache.gobblin.configuration.ConfigurationKeys; +import org.apache.gobblin.configuration.WorkUnitState; + + +/** + * The strategy to compare Avro schema. + */ +public interface AvroSchemaCheckStrategy { + /** + * A factory to initiate the Strategy + */ + class AvroSchemaCheckStrategyFactory { +/** + * Use the configuration to create a schema check strategy. If it's not found, return null. + * @param state + * @return + */ +public static AvroSchemaCheckStrategy create(WorkUnitState state) +{ + try { +return (AvroSchemaCheckStrategy) Class.forName(state.getProp(ConfigurationKeys.AVRO_SCHEMA_CHECK_STRATEGY, ConfigurationKeys.AVRO_SCHEMA_CHECK_STRATEGY_DEFAULT)).newInstance(); + } catch (Exception e) + { +return null; Review comment: Exception caught but not propagated/logged will be a silent failure which is usually not what we wanted. If there's nothing (recovery type of thing) we could do after catching the exception, let's just log the exception as a starting point This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-gobblin] autumnust commented on a change in pull request #2637: [GOBBLIN-772]Implement Schema Comparison Strategy during Disctp
autumnust commented on a change in pull request #2637: [GOBBLIN-772]Implement Schema Comparison Strategy during Disctp URL: https://github.com/apache/incubator-gobblin/pull/2637#discussion_r287440685 ## File path: gobblin-restli/gobblin-throttling-service/gobblin-throttling-service-api/src/main/snapshot/org.apache.gobblin.restli.throttling.permits.snapshot.json ## @@ -17,6 +17,18 @@ "type" : "long", "doc" : "Client should not try to acquire permits before this delay has passed.", "optional" : true +}, { Review comment: Got the point. Just mention in the description that why this changes (which is unrelated to the PR itself) shows up as part of code change. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Work logged] (GOBBLIN-707) combine & standardize all gobblin scripts into one master script & restructure configs accordingly
[ https://issues.apache.org/jira/browse/GOBBLIN-707?focusedWorklogId=248136=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-248136 ] ASF GitHub Bot logged work on GOBBLIN-707: -- Author: ASF GitHub Bot Created on: 24/May/19 16:40 Start Date: 24/May/19 16:40 Worklog Time Spent: 10m Work Description: htran1 commented on pull request #2578: [GOBBLIN-707] rewrite gobblin script to combine all modes and command URL: https://github.com/apache/incubator-gobblin/pull/2578#discussion_r287436214 ## File path: bin/gobblin-admin.sh ## @@ -1,142 +0,0 @@ -#!/bin/bash Review comment: How about maintaining compatibility with callers of the old script? One option is to leave the old scripts as wrapper scripts that call the new one. Another option is to have the old script names be symlinks to the new script and detect the usage based on the value of $0. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 248136) Time Spent: 6h 50m (was: 6h 40m) > combine & standardize all gobblin scripts into one master script & > restructure configs accordingly > -- > > Key: GOBBLIN-707 > URL: https://issues.apache.org/jira/browse/GOBBLIN-707 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Jay Sen >Priority: Major > Time Spent: 6h 50m > Remaining Estimate: 0h > > gobblin supports multiple modes of executions ( CLI, Standalone, > cluster-master, cluster-worker, AWS, YARN, MR ) and various command lines > utility to run cli and admin commands. There is a individual script for each > of them. > Having individual script introduces lot of issues > # all scripts handles gobblin variables, user parameters differently, and > its highly inconsistent among various different gobblin scripts > # functionality around start, stop, status checking and handling PID's among > lot of other things, varies vastly as per the implementation of the script. > # features like GC & JVM params, log4j file selection, classpath > calculation, etc... exists in some gobblin scripts but not all, adding to > inconsistent user experience. > # maintaining total 13 script would be too much effort. > Also all the gobblin scripts share lot of common code to handle params, > start, stop services, status checks, pid handling, etc... combining all the > scripts into 1 not only makes maintenance easier but also brings clarity and > consistency. > > Solution: > 1. there can be one gobblin.sh script to handle all gobblin commands and > deployment options as per following signature. NOTE: This > {{gobblin.sh }} > {{gobblin.sh }} > {{commands values: admin, cli, statestore-check, statestore-clean, > historystore-manager, classpath}} > {{service values: standalone, cluster-master, cluster-worker, aws, yarn, mr, > service}} > with above change, following becomes valid command. > {code:java} > # all under GobblinCli class > gobblin run listQuickApps –> gobblin cli run listQuickApps > gobblin run listQuickApps –> gobblin cli run listQuickApps > gobblin run -> gobblin cli run > # class: JobStateToJsonConverter > statestore-checker.sh -> gobblin statestore-checker > # class: StateStoreCleaner > statestore-clean.sh -> gobblin statestore-clean > # class: DatabaseJobHistoryStoreSchemaManager > historystore-manager.sh -> gobblin historystore-manager > # class: Cli > gobblin-admin.sh-> gobblin admin > # all gobblin deployment modes > gobblin-cluster-master.sh -> gobblin cluster-mater start|stop|status > gobblin-cluster-worker.sh -> gobblin cluster-mater start|stop|status > gobblin-compaction.sh -> gobblin cluster-mater start|stop|status > gobblin-env.sh -> gobblin cluster-mater start|stop|status > gobblin-mapreduce.sh-> gobblin cluster-mater start|stop|status > gobblin-service.sh -> gobblin cluster-mater start|stop|status > gobblin-standalone.sh -> gobblin cluster-mater start|stop|status > gobblin-yarn.sh -> gobblin cluster-mater start|stop|status > {code} > > 2. Also configs needs to be structured and deduped accordingly to make it > clear on which config will be picked up for which execution mode. > {color:#ff} > NOTE: this refactoring adds all cli and service commands to gobblin.sh and > hence changes the syntax for all commands and services.{color} -- This message was sent by Atlassian
[GitHub] [incubator-gobblin] htran1 commented on a change in pull request #2578: [GOBBLIN-707] rewrite gobblin script to combine all modes and command
htran1 commented on a change in pull request #2578: [GOBBLIN-707] rewrite gobblin script to combine all modes and command URL: https://github.com/apache/incubator-gobblin/pull/2578#discussion_r287436214 ## File path: bin/gobblin-admin.sh ## @@ -1,142 +0,0 @@ -#!/bin/bash Review comment: How about maintaining compatibility with callers of the old script? One option is to leave the old scripts as wrapper scripts that call the new one. Another option is to have the old script names be symlinks to the new script and detect the usage based on the value of $0. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Created] (GOBBLIN-781) Clean Flow execution state when DR is enabled: Skeleton
Lei Sun created GOBBLIN-781: --- Summary: Clean Flow execution state when DR is enabled: Skeleton Key: GOBBLIN-781 URL: https://issues.apache.org/jira/browse/GOBBLIN-781 Project: Apache Gobblin Issue Type: Improvement Reporter: Lei Sun -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (GOBBLIN-707) combine & standardize all gobblin scripts into one master script & restructure configs accordingly
[ https://issues.apache.org/jira/browse/GOBBLIN-707?focusedWorklogId=247923=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-247923 ] ASF GitHub Bot logged work on GOBBLIN-707: -- Author: ASF GitHub Bot Created on: 24/May/19 07:23 Start Date: 24/May/19 07:23 Worklog Time Spent: 10m Work Description: sv2000 commented on pull request #2578: [GOBBLIN-707] rewrite gobblin script to combine all modes and command URL: https://github.com/apache/incubator-gobblin/pull/2578#discussion_r287240383 ## File path: gobblin-cluster/src/main/java/org/apache/gobblin/cluster/HelixRetriggeringJobCallable.java ## @@ -152,7 +152,7 @@ public Void call() throws JobException { GobblinClusterConfigurationKeys.JOB_ALWAYS_DELETE, "false"); -try { +try { //TODO: what is really the difference ? Review comment: Why is this relevant to this PR? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 247923) Time Spent: 6h 40m (was: 6.5h) > combine & standardize all gobblin scripts into one master script & > restructure configs accordingly > -- > > Key: GOBBLIN-707 > URL: https://issues.apache.org/jira/browse/GOBBLIN-707 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Jay Sen >Priority: Major > Time Spent: 6h 40m > Remaining Estimate: 0h > > gobblin supports multiple modes of executions ( CLI, Standalone, > cluster-master, cluster-worker, AWS, YARN, MR ) and various command lines > utility to run cli and admin commands. There is a individual script for each > of them. > Having individual script introduces lot of issues > # all scripts handles gobblin variables, user parameters differently, and > its highly inconsistent among various different gobblin scripts > # functionality around start, stop, status checking and handling PID's among > lot of other things, varies vastly as per the implementation of the script. > # features like GC & JVM params, log4j file selection, classpath > calculation, etc... exists in some gobblin scripts but not all, adding to > inconsistent user experience. > # maintaining total 13 script would be too much effort. > Also all the gobblin scripts share lot of common code to handle params, > start, stop services, status checks, pid handling, etc... combining all the > scripts into 1 not only makes maintenance easier but also brings clarity and > consistency. > > Solution: > 1. there can be one gobblin.sh script to handle all gobblin commands and > deployment options as per following signature. NOTE: This > {{gobblin.sh }} > {{gobblin.sh }} > {{commands values: admin, cli, statestore-check, statestore-clean, > historystore-manager, classpath}} > {{service values: standalone, cluster-master, cluster-worker, aws, yarn, mr, > service}} > with above change, following becomes valid command. > {code:java} > # all under GobblinCli class > gobblin run listQuickApps –> gobblin cli run listQuickApps > gobblin run listQuickApps –> gobblin cli run listQuickApps > gobblin run -> gobblin cli run > # class: JobStateToJsonConverter > statestore-checker.sh -> gobblin statestore-checker > # class: StateStoreCleaner > statestore-clean.sh -> gobblin statestore-clean > # class: DatabaseJobHistoryStoreSchemaManager > historystore-manager.sh -> gobblin historystore-manager > # class: Cli > gobblin-admin.sh-> gobblin admin > # all gobblin deployment modes > gobblin-cluster-master.sh -> gobblin cluster-mater start|stop|status > gobblin-cluster-worker.sh -> gobblin cluster-mater start|stop|status > gobblin-compaction.sh -> gobblin cluster-mater start|stop|status > gobblin-env.sh -> gobblin cluster-mater start|stop|status > gobblin-mapreduce.sh-> gobblin cluster-mater start|stop|status > gobblin-service.sh -> gobblin cluster-mater start|stop|status > gobblin-standalone.sh -> gobblin cluster-mater start|stop|status > gobblin-yarn.sh -> gobblin cluster-mater start|stop|status > {code} > > 2. Also configs needs to be structured and deduped accordingly to make it > clear on which config will be picked up for which execution mode. > {color:#ff} > NOTE: this refactoring adds all cli and service commands to gobblin.sh and > hence changes the syntax for all commands and services.{color} -- This message was sent by Atlassian JIRA
[jira] [Work logged] (GOBBLIN-707) combine & standardize all gobblin scripts into one master script & restructure configs accordingly
[ https://issues.apache.org/jira/browse/GOBBLIN-707?focusedWorklogId=247922=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-247922 ] ASF GitHub Bot logged work on GOBBLIN-707: -- Author: ASF GitHub Bot Created on: 24/May/19 07:23 Start Date: 24/May/19 07:23 Worklog Time Spent: 10m Work Description: sv2000 commented on pull request #2578: [GOBBLIN-707] rewrite gobblin script to combine all modes and command URL: https://github.com/apache/incubator-gobblin/pull/2578#discussion_r287242524 ## File path: bin/gobblin.sh ## @@ -1,4 +1,4 @@ -#!/bin/bash +#!/usr/bin/env bash Review comment: @jhsenjaliya since this is a big change, and the fact that we use the existing shell scripts in some of our internal use cases, it would be great if you can provide documentation on how to run common commands using the new script. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 247922) Time Spent: 6.5h (was: 6h 20m) > combine & standardize all gobblin scripts into one master script & > restructure configs accordingly > -- > > Key: GOBBLIN-707 > URL: https://issues.apache.org/jira/browse/GOBBLIN-707 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Jay Sen >Priority: Major > Time Spent: 6.5h > Remaining Estimate: 0h > > gobblin supports multiple modes of executions ( CLI, Standalone, > cluster-master, cluster-worker, AWS, YARN, MR ) and various command lines > utility to run cli and admin commands. There is a individual script for each > of them. > Having individual script introduces lot of issues > # all scripts handles gobblin variables, user parameters differently, and > its highly inconsistent among various different gobblin scripts > # functionality around start, stop, status checking and handling PID's among > lot of other things, varies vastly as per the implementation of the script. > # features like GC & JVM params, log4j file selection, classpath > calculation, etc... exists in some gobblin scripts but not all, adding to > inconsistent user experience. > # maintaining total 13 script would be too much effort. > Also all the gobblin scripts share lot of common code to handle params, > start, stop services, status checks, pid handling, etc... combining all the > scripts into 1 not only makes maintenance easier but also brings clarity and > consistency. > > Solution: > 1. there can be one gobblin.sh script to handle all gobblin commands and > deployment options as per following signature. NOTE: This > {{gobblin.sh }} > {{gobblin.sh }} > {{commands values: admin, cli, statestore-check, statestore-clean, > historystore-manager, classpath}} > {{service values: standalone, cluster-master, cluster-worker, aws, yarn, mr, > service}} > with above change, following becomes valid command. > {code:java} > # all under GobblinCli class > gobblin run listQuickApps –> gobblin cli run listQuickApps > gobblin run listQuickApps –> gobblin cli run listQuickApps > gobblin run -> gobblin cli run > # class: JobStateToJsonConverter > statestore-checker.sh -> gobblin statestore-checker > # class: StateStoreCleaner > statestore-clean.sh -> gobblin statestore-clean > # class: DatabaseJobHistoryStoreSchemaManager > historystore-manager.sh -> gobblin historystore-manager > # class: Cli > gobblin-admin.sh-> gobblin admin > # all gobblin deployment modes > gobblin-cluster-master.sh -> gobblin cluster-mater start|stop|status > gobblin-cluster-worker.sh -> gobblin cluster-mater start|stop|status > gobblin-compaction.sh -> gobblin cluster-mater start|stop|status > gobblin-env.sh -> gobblin cluster-mater start|stop|status > gobblin-mapreduce.sh-> gobblin cluster-mater start|stop|status > gobblin-service.sh -> gobblin cluster-mater start|stop|status > gobblin-standalone.sh -> gobblin cluster-mater start|stop|status > gobblin-yarn.sh -> gobblin cluster-mater start|stop|status > {code} > > 2. Also configs needs to be structured and deduped accordingly to make it > clear on which config will be picked up for which execution mode. > {color:#ff} > NOTE: this refactoring adds all cli and service commands to gobblin.sh and > hence changes the syntax for all commands and services.{color} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [incubator-gobblin] sv2000 commented on a change in pull request #2578: [GOBBLIN-707] rewrite gobblin script to combine all modes and command
sv2000 commented on a change in pull request #2578: [GOBBLIN-707] rewrite gobblin script to combine all modes and command URL: https://github.com/apache/incubator-gobblin/pull/2578#discussion_r287240383 ## File path: gobblin-cluster/src/main/java/org/apache/gobblin/cluster/HelixRetriggeringJobCallable.java ## @@ -152,7 +152,7 @@ public Void call() throws JobException { GobblinClusterConfigurationKeys.JOB_ALWAYS_DELETE, "false"); -try { +try { //TODO: what is really the difference ? Review comment: Why is this relevant to this PR? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-gobblin] sv2000 commented on a change in pull request #2578: [GOBBLIN-707] rewrite gobblin script to combine all modes and command
sv2000 commented on a change in pull request #2578: [GOBBLIN-707] rewrite gobblin script to combine all modes and command URL: https://github.com/apache/incubator-gobblin/pull/2578#discussion_r287242524 ## File path: bin/gobblin.sh ## @@ -1,4 +1,4 @@ -#!/bin/bash +#!/usr/bin/env bash Review comment: @jhsenjaliya since this is a big change, and the fact that we use the existing shell scripts in some of our internal use cases, it would be great if you can provide documentation on how to run common commands using the new script. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Work logged] (GOBBLIN-771) emit a few metrics for gobblin service
[ https://issues.apache.org/jira/browse/GOBBLIN-771?focusedWorklogId=247914=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-247914 ] ASF GitHub Bot logged work on GOBBLIN-771: -- Author: ASF GitHub Bot Created on: 24/May/19 06:51 Start Date: 24/May/19 06:51 Worklog Time Spent: 10m Work Description: asfgit commented on pull request #2635: [GOBBLIN-771] add a few metrics for gobblin service URL: https://github.com/apache/incubator-gobblin/pull/2635 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 247914) Time Spent: 1.5h (was: 1h 20m) > emit a few metrics for gobblin service > -- > > Key: GOBBLIN-771 > URL: https://issues.apache.org/jira/browse/GOBBLIN-771 > Project: Apache Gobblin > Issue Type: Task >Reporter: Arjun Singh Bora >Priority: Major > Time Spent: 1.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [incubator-gobblin] asfgit closed pull request #2635: [GOBBLIN-771] add a few metrics for gobblin service
asfgit closed pull request #2635: [GOBBLIN-771] add a few metrics for gobblin service URL: https://github.com/apache/incubator-gobblin/pull/2635 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services