[jira] [Resolved] (GOBBLIN-277) Add a lock to make multihop thread safe
[ https://issues.apache.org/jira/browse/GOBBLIN-277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kuai Yu resolved GOBBLIN-277. - Resolution: Fixed > Add a lock to make multihop thread safe > --- > > Key: GOBBLIN-277 > URL: https://issues.apache.org/jira/browse/GOBBLIN-277 > Project: Apache Gobblin > Issue Type: Bug >Reporter: Kuai Yu >Assignee: Kuai Yu >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (GOBBLIN-373) Expose task executor auto scale metrics to external sensor
[ https://issues.apache.org/jira/browse/GOBBLIN-373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kuai Yu resolved GOBBLIN-373. - Resolution: Fixed > Expose task executor auto scale metrics to external sensor > -- > > Key: GOBBLIN-373 > URL: https://issues.apache.org/jira/browse/GOBBLIN-373 > Project: Apache Gobblin > Issue Type: Task >Reporter: Kuai Yu >Assignee: Kuai Yu >Priority: Major > > This is used for LinkedIn inGraph integration. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (GOBBLIN-1134) Improve FAQs
Kuai Yu created GOBBLIN-1134: Summary: Improve FAQs Key: GOBBLIN-1134 URL: https://issues.apache.org/jira/browse/GOBBLIN-1134 Project: Apache Gobblin Issue Type: Improvement Reporter: Kuai Yu Add more details if user hits NPE with below error message h5. Gradle Build Fails With {{Cannot invoke method getURLs on null object}} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (GOBBLIN-1129) clean up staging table created by avro2orc pipeline
Kuai Yu created GOBBLIN-1129: Summary: clean up staging table created by avro2orc pipeline Key: GOBBLIN-1129 URL: https://issues.apache.org/jira/browse/GOBBLIN-1129 Project: Apache Gobblin Issue Type: Improvement Reporter: Kuai Yu During the avro2orc conversion, many staging tables are created, they are not cleaned up due to the pipeline failure, and next execution doesn't take care of the clean up, which caused the staging table taking many spaces in hive metastore. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (GOBBLIN-1124) Error message should include throwable details when http converter hits a failure
[ https://issues.apache.org/jira/browse/GOBBLIN-1124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kuai Yu resolved GOBBLIN-1124. -- Resolution: Fixed > Error message should include throwable details when http converter hits a > failure > - > > Key: GOBBLIN-1124 > URL: https://issues.apache.org/jira/browse/GOBBLIN-1124 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Kuai Yu >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (GOBBLIN-1124) Error message should include throwable details when http converter hits a failure
Kuai Yu created GOBBLIN-1124: Summary: Error message should include throwable details when http converter hits a failure Key: GOBBLIN-1124 URL: https://issues.apache.org/jira/browse/GOBBLIN-1124 Project: Apache Gobblin Issue Type: Improvement Reporter: Kuai Yu -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (GOBBLIN-1049) Move workunit commit logic to the end of publish().
Kuai Yu created GOBBLIN-1049: Summary: Move workunit commit logic to the end of publish(). Key: GOBBLIN-1049 URL: https://issues.apache.org/jira/browse/GOBBLIN-1049 Project: Apache Gobblin Issue Type: Improvement Reporter: Kuai Yu We should not blindly commit workunit in the BaseDataPublisher. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (GOBBLIN-962) Refactor RecursiveCopyableDataset so that the copy entities generation logic can be reused.
Kuai Yu created GOBBLIN-962: --- Summary: Refactor RecursiveCopyableDataset so that the copy entities generation logic can be reused. Key: GOBBLIN-962 URL: https://issues.apache.org/jira/browse/GOBBLIN-962 Project: Apache Gobblin Issue Type: Improvement Reporter: Kuai Yu Refactor RecursiveCopyableDataset so that the copy entities generation logic can be reused. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (GOBBLIN-915) Extract cannot parse the timezone
Kuai Yu created GOBBLIN-915: --- Summary: Extract cannot parse the timezone Key: GOBBLIN-915 URL: https://issues.apache.org/jira/browse/GOBBLIN-915 Project: Apache Gobblin Issue Type: Bug Reporter: Kuai Yu -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (GOBBLIN-889) FileContext doesn't use the correct writer fs uri
Kuai Yu created GOBBLIN-889: --- Summary: FileContext doesn't use the correct writer fs uri Key: GOBBLIN-889 URL: https://issues.apache.org/jira/browse/GOBBLIN-889 Project: Apache Gobblin Issue Type: Bug Reporter: Kuai Yu If we are using the hdfs uri, the current FsDataWriter seems ignore that uri during the FileContext creation, it simply creates the FileContext based on the Configuration object. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (GOBBLIN-851) Provide capability to disable hive schema registration in partition level
[ https://issues.apache.org/jira/browse/GOBBLIN-851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kuai Yu updated GOBBLIN-851: Description: We had problems when table level schema and partition level schema diverges. Think about the case when user register two partitions : 2019/08/10, 2019/08/11, but schema changes in between(S1->S2). Now the table level has schema S2, but 2019/08/10 will have schema S1. Query on the latest schema will cause the old partition failure. > Provide capability to disable hive schema registration in partition level > - > > Key: GOBBLIN-851 > URL: https://issues.apache.org/jira/browse/GOBBLIN-851 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Kuai Yu >Priority: Major > > We had problems when table level schema and partition level schema diverges. > Think about the case when user register two partitions : 2019/08/10, > 2019/08/11, but schema changes in between(S1->S2). Now the table level has > schema S2, but 2019/08/10 will have schema S1. > Query on the latest schema will cause the old partition failure. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (GOBBLIN-851) Provide capability to disable hive schema registration in partition level
Kuai Yu created GOBBLIN-851: --- Summary: Provide capability to disable hive schema registration in partition level Key: GOBBLIN-851 URL: https://issues.apache.org/jira/browse/GOBBLIN-851 Project: Apache Gobblin Issue Type: Improvement Reporter: Kuai Yu -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (GOBBLIN-813) Make SFDC connector support encrypted Salesforce client id and client secret
Kuai Yu created GOBBLIN-813: --- Summary: Make SFDC connector support encrypted Salesforce client id and client secret Key: GOBBLIN-813 URL: https://issues.apache.org/jira/browse/GOBBLIN-813 Project: Apache Gobblin Issue Type: Improvement Reporter: Kuai Yu -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GOBBLIN-783) Fix the double referencing issue for job type config
Kuai Yu created GOBBLIN-783: --- Summary: Fix the double referencing issue for job type config Key: GOBBLIN-783 URL: https://issues.apache.org/jira/browse/GOBBLIN-783 Project: Apache Gobblin Issue Type: Bug Reporter: Kuai Yu -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GOBBLIN-751) Make enforced file size matching to be configurable
Kuai Yu created GOBBLIN-751: --- Summary: Make enforced file size matching to be configurable Key: GOBBLIN-751 URL: https://issues.apache.org/jira/browse/GOBBLIN-751 Project: Apache Gobblin Issue Type: Improvement Reporter: Kuai Yu Make enforced file size matching to be configurable -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GOBBLIN-729) Add version strategy support for HiveDataset copy
Kuai Yu created GOBBLIN-729: --- Summary: Add version strategy support for HiveDataset copy Key: GOBBLIN-729 URL: https://issues.apache.org/jira/browse/GOBBLIN-729 Project: Apache Gobblin Issue Type: Improvement Reporter: Kuai Yu This PR will add data strategy support for Hive dataset copy. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GOBBLIN-713) Lazy load job specification from job catalog to avoid OOM issue when JobCatalog is bootup.
Kuai Yu created GOBBLIN-713: --- Summary: Lazy load job specification from job catalog to avoid OOM issue when JobCatalog is bootup. Key: GOBBLIN-713 URL: https://issues.apache.org/jira/browse/GOBBLIN-713 Project: Apache Gobblin Issue Type: Improvement Reporter: Kuai Yu Today whenever the job catalog is restarted, all the job specs are load into memory. This can cause OOM issue in our production load. Ticket was created to provide an easy way to load job spec without materializing all the job specs into memory. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GOBBLIN-712) Add version strategy for configbased dataset copy
Kuai Yu created GOBBLIN-712: --- Summary: Add version strategy for configbased dataset copy Key: GOBBLIN-712 URL: https://issues.apache.org/jira/browse/GOBBLIN-712 Project: Apache Gobblin Issue Type: Improvement Reporter: Kuai Yu -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GOBBLIN-703) Allow planning job to be run in a non-blocking way
Kuai Yu created GOBBLIN-703: --- Summary: Allow planning job to be run in a non-blocking way Key: GOBBLIN-703 URL: https://issues.apache.org/jira/browse/GOBBLIN-703 Project: Apache Gobblin Issue Type: Improvement Reporter: Kuai Yu Today all the planning job will be running in a dedicated thread pool and will wait until the full execution to be completed. This requires a lot of system resources and a dedicated monitoring thread. The improvement here is to reduce the waiting time on a dedicated monitoring thread. Basically once the planning job submits to the Helix, we don't need to wait on the job completion. The job status monitoring will be achieved by GaaS monitoring. By doing this, we are freeing most of the threadpool resources because each monitoring thread will be immediately return after it finishes the job submission. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GOBBLIN-690) Relaunch check for the planning job is not correct
Kuai Yu created GOBBLIN-690: --- Summary: Relaunch check for the planning job is not correct Key: GOBBLIN-690 URL: https://issues.apache.org/jira/browse/GOBBLIN-690 Project: Apache Gobblin Issue Type: Improvement Reporter: Kuai Yu -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GOBBLIN-683) Azkaban client should retry if session gets expired
Kuai Yu created GOBBLIN-683: --- Summary: Azkaban client should retry if session gets expired Key: GOBBLIN-683 URL: https://issues.apache.org/jira/browse/GOBBLIN-683 Project: Apache Gobblin Issue Type: Improvement Reporter: Kuai Yu Assignee: Kuai Yu -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GOBBLIN-664) AzkabanClient session refresh logic is not configurable
Kuai Yu created GOBBLIN-664: --- Summary: AzkabanClient session refresh logic is not configurable Key: GOBBLIN-664 URL: https://issues.apache.org/jira/browse/GOBBLIN-664 Project: Apache Gobblin Issue Type: Improvement Reporter: Kuai Yu Assignee: Kuai Yu -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GOBBLIN-661) Prevent jobs resubmission after manager failure
Kuai Yu created GOBBLIN-661: --- Summary: Prevent jobs resubmission after manager failure Key: GOBBLIN-661 URL: https://issues.apache.org/jira/browse/GOBBLIN-661 Project: Apache Gobblin Issue Type: Improvement Reporter: Kuai Yu Assignee: Kuai Yu In gobblin cluster, if manager failed and relaunched, all the jobs persisted in the job catalog will be relaunched. This can cause a few issues: 1) Scalability issue: because the unfinished job might be submitted at different point of time, now if all of them are submitted at the same time, it can cause a performance issue. 2) Waste effort: because the unfinished job now needs to be deleted, we have to kill the existing running job, and resubmit. In this change, we improve both 1) and 2) 1) In taskdriver mode, we will delete the job spec once we submit to Helix, because we believe Helix is durable and all the jobs submitted wont' be lost, so that we can safely delete the job specs. Next reboot manager won't see those deleted job spec, thus no resubmission is needed. 2) In taskdriver mode, we will cleanup Helix running jobs. If it is a planning job, we won't delete it. Instead we just let it run to the end. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (GOBBLIN-655) Allow helix jobs to have job type set
[ https://issues.apache.org/jira/browse/GOBBLIN-655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kuai Yu updated GOBBLIN-655: Description: This is required because Helix will use job type for the quota assignment. For job prioritization, we need to have different quota for different priority jobs. > Allow helix jobs to have job type set > - > > Key: GOBBLIN-655 > URL: https://issues.apache.org/jira/browse/GOBBLIN-655 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Kuai Yu >Assignee: Kuai Yu >Priority: Major > > This is required because Helix will use job type for the quota assignment. > For job prioritization, we need to have different quota for different > priority jobs. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GOBBLIN-655) Allow helix jobs to have job type set
Kuai Yu created GOBBLIN-655: --- Summary: Allow helix jobs to have job type set Key: GOBBLIN-655 URL: https://issues.apache.org/jira/browse/GOBBLIN-655 Project: Apache Gobblin Issue Type: Improvement Reporter: Kuai Yu Assignee: Kuai Yu -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GOBBLIN-652) gobblin cluster doesn't have helix related metrics
Kuai Yu created GOBBLIN-652: --- Summary: gobblin cluster doesn't have helix related metrics Key: GOBBLIN-652 URL: https://issues.apache.org/jira/browse/GOBBLIN-652 Project: Apache Gobblin Issue Type: Improvement Reporter: Kuai Yu Assignee: Kuai Yu -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GOBBLIN-649) Add task driver cluster
Kuai Yu created GOBBLIN-649: --- Summary: Add task driver cluster Key: GOBBLIN-649 URL: https://issues.apache.org/jira/browse/GOBBLIN-649 Project: Apache Gobblin Issue Type: Improvement Reporter: Kuai Yu Assignee: Kuai Yu -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GOBBLIN-647) Move early stop feature to task driver
Kuai Yu created GOBBLIN-647: --- Summary: Move early stop feature to task driver Key: GOBBLIN-647 URL: https://issues.apache.org/jira/browse/GOBBLIN-647 Project: Apache Gobblin Issue Type: Improvement Reporter: Kuai Yu Assignee: Kuai Yu -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GOBBLIN-639) Change serder method to static from RequesterService
Kuai Yu created GOBBLIN-639: --- Summary: Change serder method to static from RequesterService Key: GOBBLIN-639 URL: https://issues.apache.org/jira/browse/GOBBLIN-639 Project: Apache Gobblin Issue Type: Improvement Reporter: Kuai Yu Assignee: Kuai Yu -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GOBBLIN-626) Fix the wrong planning job tag key
Kuai Yu created GOBBLIN-626: --- Summary: Fix the wrong planning job tag key Key: GOBBLIN-626 URL: https://issues.apache.org/jira/browse/GOBBLIN-626 Project: Apache Gobblin Issue Type: Bug Reporter: Kuai Yu Assignee: Kuai Yu -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GOBBLIN-625) Distributed job launcher doesn't have helix tagging support
Kuai Yu created GOBBLIN-625: --- Summary: Distributed job launcher doesn't have helix tagging support Key: GOBBLIN-625 URL: https://issues.apache.org/jira/browse/GOBBLIN-625 Project: Apache Gobblin Issue Type: Improvement Reporter: Kuai Yu Assignee: Kuai Yu This ticket is created to add Helix tagging support for distributed job launcher. In distributed job launcher, when a planning job was sent out, it can be tagged with "gobblin.cluster.helixPlanningJobTag". This differentiates from "gobblin.cluster.helixJobTag" because latter is used for the actual job instead of a planning job. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GOBBLIN-622) Avoid to serialize all previous workunits in SourceState to save both memory and diskspace
Kuai Yu created GOBBLIN-622: --- Summary: Avoid to serialize all previous workunits in SourceState to save both memory and diskspace Key: GOBBLIN-622 URL: https://issues.apache.org/jira/browse/GOBBLIN-622 Project: Apache Gobblin Issue Type: Improvement Reporter: Kuai Yu -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GOBBLIN-620) NPE when catalog metrics is not enabled
Kuai Yu created GOBBLIN-620: --- Summary: NPE when catalog metrics is not enabled Key: GOBBLIN-620 URL: https://issues.apache.org/jira/browse/GOBBLIN-620 Project: Apache Gobblin Issue Type: Improvement Reporter: Kuai Yu Assignee: Kuai Yu When getStandardMetrics is invoked, the metrics is null if the metrics is not enabled. This will cause the ImmutableList to contain a null object which raised an NPE -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GOBBLIN-619) Fix the metrics name for GobblinHelixJobScheduler
Kuai Yu created GOBBLIN-619: --- Summary: Fix the metrics name for GobblinHelixJobScheduler Key: GOBBLIN-619 URL: https://issues.apache.org/jira/browse/GOBBLIN-619 Project: Apache Gobblin Issue Type: Improvement Reporter: Kuai Yu Assignee: Kuai Yu -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GOBBLIN-618) Remove unnecessary methods from StandardMetricsBridge
Kuai Yu created GOBBLIN-618: --- Summary: Remove unnecessary methods from StandardMetricsBridge Key: GOBBLIN-618 URL: https://issues.apache.org/jira/browse/GOBBLIN-618 Project: Apache Gobblin Issue Type: Improvement Reporter: Kuai Yu Assignee: Kuai Yu -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GOBBLIN-617) Add distributed job launcher metrics and some refactoring.
Kuai Yu created GOBBLIN-617: --- Summary: Add distributed job launcher metrics and some refactoring. Key: GOBBLIN-617 URL: https://issues.apache.org/jira/browse/GOBBLIN-617 Project: Apache Gobblin Issue Type: Improvement Reporter: Kuai Yu Assignee: Kuai Yu Add metrics for GobblinHelixJobTask. Refactored metrics for GobblinHelixJobScheduler Refactored metrics for GobblinHelixJobLauncher -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GOBBLIN-615) Make LWM==HWM a valid interval in QueryBaseSource
Kuai Yu created GOBBLIN-615: --- Summary: Make LWM==HWM a valid interval in QueryBaseSource Key: GOBBLIN-615 URL: https://issues.apache.org/jira/browse/GOBBLIN-615 Project: Apache Gobblin Issue Type: Improvement Reporter: Kuai Yu Assignee: Kuai Yu We have seen many issues in DateWatermark where the job intermittently failed every other day. The reason is as follows: # On 10-02 at 17:47 job pulls with logindate >= 2018-10-01 (HWM = 10-2, when job finished Actual_HWM is 10/2) # On same 10-02 date, if the job repulled, we would have LWM=10-3, HWM=10-2, the job would fail as expected. # On 10-03 at 17:47 job fails to generate any workunits because now LWM = Actual_HWM + 1 = 10-3, HWM = 10-3. According to DateWatermark::getIntervals(), the startTime must be less than endTime to generate an interval. # On 10-04 at 17:47 job recovered because LWM keeps as 10-3 and HWM = 10-4, so a valid interval is generated again. The fix here is to let DateWatermark generate an interval at step 3, so that we won't have an intermittent failure in step 3. However this fix will cause another problem. Today we could have missing data in step 1 and 4, because step 1 pulls data for 10/2 too early and step 4 pulls data for 10/4 too early, but at least step 3 pulls whole data for 10/3. After this fix, the 10/3 will be pulled too early as well. So that this fix needs to be working with Cutoff feature so that we will only pull 10-1's data on 10/2. Thanks, Kuai -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GOBBLIN-608) Allow user to configure the fork operation tiimeout
Kuai Yu created GOBBLIN-608: --- Summary: Allow user to configure the fork operation tiimeout Key: GOBBLIN-608 URL: https://issues.apache.org/jira/browse/GOBBLIN-608 Project: Apache Gobblin Issue Type: Improvement Reporter: Kuai Yu Assignee: Kuai Yu Allow fork operation to be configured with max waiting time. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GOBBLIN-602) Allow AzkabanProducer to be customized
Kuai Yu created GOBBLIN-602: --- Summary: Allow AzkabanProducer to be customized Key: GOBBLIN-602 URL: https://issues.apache.org/jira/browse/GOBBLIN-602 Project: Apache Gobblin Issue Type: Improvement Reporter: Kuai Yu Assignee: Kuai Yu Use java reflection to construct the AzkabanProducer class instead of always assuming it's an AzkabanProducer.class. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GOBBLIN-601) Add cancellation to AzkabanClient
Kuai Yu created GOBBLIN-601: --- Summary: Add cancellation to AzkabanClient Key: GOBBLIN-601 URL: https://issues.apache.org/jira/browse/GOBBLIN-601 Project: Apache Gobblin Issue Type: Improvement Reporter: Kuai Yu Assignee: Kuai Yu -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GOBBLIN-591) Allow user to pass in the customized http client for azkaban client
Kuai Yu created GOBBLIN-591: --- Summary: Allow user to pass in the customized http client for azkaban client Key: GOBBLIN-591 URL: https://issues.apache.org/jira/browse/GOBBLIN-591 Project: Apache Gobblin Issue Type: Improvement Reporter: Kuai Yu Assignee: Kuai Yu -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GOBBLIN-588) Remove final decorator to allow password be overwritten
Kuai Yu created GOBBLIN-588: --- Summary: Remove final decorator to allow password be overwritten Key: GOBBLIN-588 URL: https://issues.apache.org/jira/browse/GOBBLIN-588 Project: Apache Gobblin Issue Type: Improvement Reporter: Kuai Yu -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GOBBLIN-584) Fix the helix key configuration naming.
Kuai Yu created GOBBLIN-584: --- Summary: Fix the helix key configuration naming. Key: GOBBLIN-584 URL: https://issues.apache.org/jira/browse/GOBBLIN-584 Project: Apache Gobblin Issue Type: Improvement Reporter: Kuai Yu -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GOBBLIN-575) Remove scala dependencies
Kuai Yu created GOBBLIN-575: --- Summary: Remove scala dependencies Key: GOBBLIN-575 URL: https://issues.apache.org/jira/browse/GOBBLIN-575 Project: Apache Gobblin Issue Type: Improvement Reporter: Kuai Yu Some of the Scala dependencies were added in build.gradle, which caused some scala binary detection tools to fail within LinkedIn. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GOBBLIN-574) elasticsearch-dep module failed to build
Kuai Yu created GOBBLIN-574: --- Summary: elasticsearch-dep module failed to build Key: GOBBLIN-574 URL: https://issues.apache.org/jira/browse/GOBBLIN-574 Project: Apache Gobblin Issue Type: Bug Reporter: Kuai Yu Assignee: Kuai Yu > Configure project :gobblin-distribution Using flavor:standard for project gobblin-distribution Download https://plugins.gradle.org/m2/com/commercehub/gradle/plugin/avro-base/com.commercehub.gradle.plugin.avro-base.gradle.plugin/0.9.0/com.commercehub.gradle.plugin.avro-base.gradle.plugin-0.9.0.pom Download https://plugins.gradle.org/m2/com/commercehub/gradle/plugin/gradle-avro-plugin/0.9.0/gradle-avro-plugin-0.9.0.pom Download https://plugins.gradle.org/m2/com/github/johnrengelman/shadow/com.github.johnrengelman.shadow.gradle.plugin/1.2.4/com.github.johnrengelman.shadow.gradle.plugin-1.2.4.pom FAILURE: Build failed with an exception. What went wrong: A problem occurred configuring project ':gobblin-modules:gobblin-elasticsearch-deps'. > Could not get unknown property 'mavenDeployer' for repository container of > type org.gradle.api.internal.artifacts.dsl.DefaultRepositoryHandler. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GOBBLIN-550) When RuntimeException occurred, alwaysDelete flag doesn't work
Kuai Yu created GOBBLIN-550: --- Summary: When RuntimeException occurred, alwaysDelete flag doesn't work Key: GOBBLIN-550 URL: https://issues.apache.org/jira/browse/GOBBLIN-550 Project: Apache Gobblin Issue Type: Bug Reporter: Kuai Yu Assignee: Kuai Yu -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GOBBLIN-546) Use appropriate Lists package
Kuai Yu created GOBBLIN-546: --- Summary: Use appropriate Lists package Key: GOBBLIN-546 URL: https://issues.apache.org/jira/browse/GOBBLIN-546 Project: Apache Gobblin Issue Type: Bug Reporter: Kuai Yu -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GOBBLIN-536) Allow user to configure connection string properties in mysql extractor
Kuai Yu created GOBBLIN-536: --- Summary: Allow user to configure connection string properties in mysql extractor Key: GOBBLIN-536 URL: https://issues.apache.org/jira/browse/GOBBLIN-536 Project: Apache Gobblin Issue Type: New Feature Reporter: Kuai Yu -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GOBBLIN-535) Add second hop for distributed job launcher
Kuai Yu created GOBBLIN-535: --- Summary: Add second hop for distributed job launcher Key: GOBBLIN-535 URL: https://issues.apache.org/jira/browse/GOBBLIN-535 Project: Apache Gobblin Issue Type: New Feature Reporter: Kuai Yu Assignee: Kuai Yu In previous PR: [https://github.com/apache/incubator-gobblin/pull/2360.] A planning job can be distributed to remote node. But remote node is doing NOOP. In this PR, remote node will do actual GobblinHelixJobLauncher work. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GOBBLIN-532) Always delete jobSpec no matter if the job is successful or not
Kuai Yu created GOBBLIN-532: --- Summary: Always delete jobSpec no matter if the job is successful or not Key: GOBBLIN-532 URL: https://issues.apache.org/jira/browse/GOBBLIN-532 Project: Apache Gobblin Issue Type: New Feature Reporter: Kuai Yu Assignee: Kuai Yu -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GOBBLIN-522) Multiple build issues
Kuai Yu created GOBBLIN-522: --- Summary: Multiple build issues Key: GOBBLIN-522 URL: https://issues.apache.org/jira/browse/GOBBLIN-522 Project: Apache Gobblin Issue Type: Bug Reporter: Kuai Yu Assignee: Kuai Yu -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (GOBBLIN-490) Add planning job execution launcher
[ https://issues.apache.org/jira/browse/GOBBLIN-490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kuai Yu updated GOBBLIN-490: Description: This new job launcher will forward the original job to one of the GobblinTaskRunner(s). Instead of executing the task driver logic on GobblinClusterManager, the task driver logic now can be run on GobblinTaskRunner. (was: This new job launcher will submit original job to Helix instead of the generated work units. This will allow Helix to re-distribute the original jobs to different worker nodes. Each worker then process the original jobs and launch its own GobblinHelixJobLauncher. By doing this, we will relieve manager node because the task driver logic (mainly generate work units) is now distributed to worker nodes.) > Add planning job execution launcher > > > Key: GOBBLIN-490 > URL: https://issues.apache.org/jira/browse/GOBBLIN-490 > Project: Apache Gobblin > Issue Type: New Feature >Reporter: Kuai Yu >Assignee: Kuai Yu >Priority: Major > > This new job launcher will forward the original job to one of the > GobblinTaskRunner(s). Instead of executing the task driver logic on > GobblinClusterManager, the task driver logic now can be run on > GobblinTaskRunner. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (GOBBLIN-490) Add planning job execution launcher
[ https://issues.apache.org/jira/browse/GOBBLIN-490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kuai Yu updated GOBBLIN-490: Summary: Add planning job execution launcher (was: Allow job to be distributed by Helix) > Add planning job execution launcher > > > Key: GOBBLIN-490 > URL: https://issues.apache.org/jira/browse/GOBBLIN-490 > Project: Apache Gobblin > Issue Type: New Feature >Reporter: Kuai Yu >Assignee: Kuai Yu >Priority: Major > > This new job launcher will submit original job to Helix instead of the > generated work units. This will allow Helix to re-distribute the original > jobs to different worker nodes. Each worker then process the original jobs > and launch its own GobblinHelixJobLauncher. By doing this, we will relieve > manager node because the task driver logic (mainly generate work units) is > now distributed to worker nodes. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GOBBLIN-510) Decouple JobExecutionLauncher and JobExecutionDriver
Kuai Yu created GOBBLIN-510: --- Summary: Decouple JobExecutionLauncher and JobExecutionDriver Key: GOBBLIN-510 URL: https://issues.apache.org/jira/browse/GOBBLIN-510 Project: Apache Gobblin Issue Type: New Feature Reporter: Kuai Yu Assignee: Kuai Yu Today JobExecutionLauncher and JobExecutionDriver is coupled. It means when JobExecutionLauncher invokes launchJob, a JobExecutionDriver is immediately return. This is not good for gobblin cluster because the Launcher might running in manager node but the actual driver logic is running on worker node. We need some refactoring to allow us decouple these two. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GOBBLIN-506) Job tagging support in Gobblin cluster
Kuai Yu created GOBBLIN-506: --- Summary: Job tagging support in Gobblin cluster Key: GOBBLIN-506 URL: https://issues.apache.org/jira/browse/GOBBLIN-506 Project: Apache Gobblin Issue Type: New Feature Reporter: Kuai Yu Assignee: Kuai Yu -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GOBBLIN-504) HiveMetastoreClientPool has findbugsMain issue due to unprotected static variable initialization
Kuai Yu created GOBBLIN-504: --- Summary: HiveMetastoreClientPool has findbugsMain issue due to unprotected static variable initialization Key: GOBBLIN-504 URL: https://issues.apache.org/jira/browse/GOBBLIN-504 Project: Apache Gobblin Issue Type: Bug Reporter: Kuai Yu Assignee: Kuai Yu -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GOBBLIN-497) GobblinHelixJobScheduler should not start scheduling before the scheduler service is up
Kuai Yu created GOBBLIN-497: --- Summary: GobblinHelixJobScheduler should not start scheduling before the scheduler service is up Key: GOBBLIN-497 URL: https://issues.apache.org/jira/browse/GOBBLIN-497 Project: Apache Gobblin Issue Type: Bug Reporter: Kuai Yu Assignee: Kuai Yu -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GOBBLIN-495) FlowSpec should be deleted if this is run once flow
Kuai Yu created GOBBLIN-495: --- Summary: FlowSpec should be deleted if this is run once flow Key: GOBBLIN-495 URL: https://issues.apache.org/jira/browse/GOBBLIN-495 Project: Apache Gobblin Issue Type: New Feature Reporter: Kuai Yu Assignee: Kuai Yu -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (GOBBLIN-495) FlowSpec should be deleted if this is run once flow
[ https://issues.apache.org/jira/browse/GOBBLIN-495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kuai Yu updated GOBBLIN-495: Issue Type: Bug (was: New Feature) > FlowSpec should be deleted if this is run once flow > --- > > Key: GOBBLIN-495 > URL: https://issues.apache.org/jira/browse/GOBBLIN-495 > Project: Apache Gobblin > Issue Type: Bug >Reporter: Kuai Yu >Assignee: Kuai Yu >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (GOBBLIN-490) Allow job to be distributed by Helix
[ https://issues.apache.org/jira/browse/GOBBLIN-490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kuai Yu updated GOBBLIN-490: Description: This new job launcher will submit original job to Helix instead of the generated work units. This will allow Helix to re-distribute the original jobs to different worker nodes. Each worker then process the original jobs and launch its own GobblinHelixJobLauncher. By doing this, we will relieve manager node because the task driver logic (mainly generate work units) is now distributed to worker nodes. > Allow job to be distributed by Helix > > > Key: GOBBLIN-490 > URL: https://issues.apache.org/jira/browse/GOBBLIN-490 > Project: Apache Gobblin > Issue Type: New Feature >Reporter: Kuai Yu >Assignee: Kuai Yu >Priority: Major > > This new job launcher will submit original job to Helix instead of the > generated work units. This will allow Helix to re-distribute the original > jobs to different worker nodes. Each worker then process the original jobs > and launch its own GobblinHelixJobLauncher. By doing this, we will relieve > manager node because the task driver logic (mainly generate work units) is > now distributed to worker nodes. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GOBBLIN-490) Allow job to be distributed by Helix
Kuai Yu created GOBBLIN-490: --- Summary: Allow job to be distributed by Helix Key: GOBBLIN-490 URL: https://issues.apache.org/jira/browse/GOBBLIN-490 Project: Apache Gobblin Issue Type: New Feature Reporter: Kuai Yu Assignee: Kuai Yu -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GOBBLIN-484) Propagate fork exception to task commit
Kuai Yu created GOBBLIN-484: --- Summary: Propagate fork exception to task commit Key: GOBBLIN-484 URL: https://issues.apache.org/jira/browse/GOBBLIN-484 Project: Apache Gobblin Issue Type: Improvement Reporter: Kuai Yu Assignee: Kuai Yu >>> Today if exception occurred in task level, we will not propagate this >>> exception to the commit phase, which means in fork.commit, we will see some >>> exceptions like this : 2018/04/30 08:03:19.369 ERROR [Task] [Task-committing-pool-0] [gobblin-cluster-worker] [DYNAMICS-CONTACT-438563007_1525075320170] Task task_DYNAMICS-CONTACT-438563007_1525075320170_0 failed org.apache.gobblin.runtime.ForkException: Fork branches [0] failed for task task_DYNAMICS-CONTACT-438563007_1525075320170_0 at org.apache.gobblin.runtime.Task.commit(Task.java:884) at org.apache.gobblin.runtime.GobblinMultiTaskAttempt$1$1.call(GobblinMultiTaskAttempt.java:167) at org.apache.gobblin.runtime.GobblinMultiTaskAttempt$1$1.call(GobblinMultiTaskAttempt.java:162) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at org.apache.gobblin.util.executors.MDCPropagatingRunnable.run(MDCPropagatingRunnable.java:39) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) >>> However the root cause of exception happened earlier before the commit >>> phase, which is in the task run() stage, some records failed to process: 2018/04/30 08:03:19.352 ERROR [Task] [TaskExecutor-1] [gobblin-cluster-worker] [DYNAMICS-CONTACT-438563007_1525075320170] Processing record incurs an unexpected exception: java.lang.IllegalStateException: Fork 0 of task task_DYNAMICS-CONTACT-438563007_1525075320170_0 has failed and is no longer running at org.apache.gobblin.runtime.fork.Fork.putRecord(Fork.java:285) at org.apache.gobblin.runtime.Task.processRecord(Task.java:778) at org.apache.gobblin.runtime.Task.runSynchronousModel(Task.java:459) at org.apache.gobblin.runtime.Task.run(Task.java:341) at org.apache.gobblin.runtime.TaskExecutor$TrackingTask.run(TaskExecutor.java:443) at org.apache.gobblin.util.executors.MDCPropagatingRunnable.run(MDCPropagatingRunnable.java:39) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 2018/04/30 08:03:19.353 ERROR [Task] [TaskExecutor-1] [gobblin-cluster-worker] [DYNAMICS-CONTACT-438563007_1525075320170] Task task_DYNAMICS-CONTACT-438563007_1525075320170_0 failed java.lang.RuntimeException at org.apache.gobblin.runtime.Task.runSynchronousModel(Task.java:464) at org.apache.gobblin.runtime.Task.run(Task.java:341) at org.apache.gobblin.runtime.TaskExecutor$TrackingTask.run(TaskExecutor.java:443) at org.apache.gobblin.util.executors.MDCPropagatingRunnable.run(MDCPropagatingRunnable.java:39) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 2018/04/30 08:03:19.368 INFO [com_2792] [TaskState >>> Now further look into the problem, we know it is due to the record >>> processing timeout from espresso writer: 2018/04/30 08:03:19.348 ERROR [Fork-0] [ForkExecutor-0] [gobblin-cluster-worker] [DYNAMICS-CONTACT-438563007_1525075320170] Fork 0 of task task_DYNAMICS-CONTACT-438563007_1525075320170_0 failed to process data records java.io.IOException: java.util.concurrent.ExecutionException: org.apache.gobblin.exception.NonTransientException: Irrecoverable failure on
[jira] [Updated] (GOBBLIN-480) Allow job distribution cluster to be separated from cluster manager cluster
[ https://issues.apache.org/jira/browse/GOBBLIN-480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kuai Yu updated GOBBLIN-480: Description: Today GobblinClusterManager leverages single Helix cluster responsible for both job distribution and cluster manager HA. This all-in-one mode cannot works with Helix super controller, because GobblinClusterManager will create its own dedicated controller for HA handling, which is internal to Gobblin framework. This architect works fine but gradually we find it's hard to monitor Helix behavior and debug Helix related issues due to the lack of Helix task framework metrics, which is enabled for free, but only available when using a dedicated controllers under Helix super controller's supervision. To allow the migration, we separated existing cluster into two clusters: 1. Our existing cluster will remain the same, but called as "job distribution cluster" in the separation mode. In unit test or local deployment mode, we will create a dedicated controller for this cluster. In production mode, we can assume Helix will provide a dedicated controller for us. 2. A new cluster will be created, now called 'manager cluster', which is responsible for cluster manager leadership change. This will provide leadership change callback just like we did earlier in all-in-one mode. The new 'two cluster mode' can be turned on/off by user configuration. Similarly user can configure whether a controller for job distribution should be created. was: Today GobblinClusterManager leverages single Helix cluster responsible for both job distribution and cluster manager HA. This all-in-one mode cannot works with Helix super controller, because GobblinClusterManager will create its own dedicated controller for HA handling, which is internal to Gobblin framework. This architect works fine but gradually we find it's hard to monitor Helix behavior and debug Helix related issues due to the lack of Helix task framework metrics, which is enabled for free, but only available when using a dedicated controllers under Helix super controller's supervision. To allow the migration, we separated existing cluster into two clusters: 1. Our existing cluster will remain the same, called "job distribution cluster". In unit test or local deployment mode, we will create a dedicated controller for this cluster. In production mode, we assume Helix will provide this dedicated controller for us. 2. A new cluster will be created, called 'manager cluster', which is responsible for cluster manager leadership change. This will leadership change callback just like we did earlier in all-in-one mode. Two cluster mode can be turned on/off by user configuration. Similarly to whether a controller for job distribution should be created. > Allow job distribution cluster to be separated from cluster manager cluster > --- > > Key: GOBBLIN-480 > URL: https://issues.apache.org/jira/browse/GOBBLIN-480 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Kuai Yu >Assignee: Kuai Yu >Priority: Major > > Today GobblinClusterManager leverages single Helix cluster responsible for > both job distribution and cluster manager HA. This all-in-one mode cannot > works with Helix super controller, because GobblinClusterManager will create > its own dedicated controller for HA handling, which is internal to Gobblin > framework. This architect works fine but gradually we find it's hard to > monitor Helix behavior and debug Helix related issues due to the lack of > Helix task framework metrics, which is enabled for free, but only available > when using a dedicated controllers under Helix super controller's supervision. > To allow the migration, we separated existing cluster into two clusters: > 1. Our existing cluster will remain the same, but called as "job distribution > cluster" in the separation mode. In unit test or local deployment mode, we > will create a dedicated controller for this cluster. In production mode, we > can assume Helix will provide a dedicated controller for us. > 2. A new cluster will be created, now called 'manager cluster', which is > responsible for cluster manager leadership change. This will provide > leadership change callback just like we did earlier in all-in-one mode. > The new 'two cluster mode' can be turned on/off by user configuration. > Similarly user can configure whether a controller for job distribution should > be created. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GOBBLIN-480) Allow job distribution cluster to be separated from cluster manager cluster
Kuai Yu created GOBBLIN-480: --- Summary: Allow job distribution cluster to be separated from cluster manager cluster Key: GOBBLIN-480 URL: https://issues.apache.org/jira/browse/GOBBLIN-480 Project: Apache Gobblin Issue Type: Improvement Reporter: Kuai Yu Assignee: Kuai Yu Today GobblinClusterManager leverages single Helix cluster responsible for both job distribution and cluster manager HA. This all-in-one mode cannot works with Helix super controller, because GobblinClusterManager will create its own dedicated controller for HA handling, which is internal to Gobblin framework. This architect works fine but gradually we find it's hard to monitor Helix behavior and debug Helix related issues due to the lack of Helix task framework metrics, which is enabled for free, but only available when using a dedicated controllers under Helix super controller's supervision. To allow the migration, we separated existing cluster into two clusters: 1. Our existing cluster will remain the same, called "job distribution cluster". In unit test or local deployment mode, we will create a dedicated controller for this cluster. In production mode, we assume Helix will provide this dedicated controller for us. 2. A new cluster will be created, called 'manager cluster', which is responsible for cluster manager leadership change. This will leadership change callback just like we did earlier in all-in-one mode. Two cluster mode can be turned on/off by user configuration. Similarly to whether a controller for job distribution should be created. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GOBBLIN-476) Add helix task timeout
Kuai Yu created GOBBLIN-476: --- Summary: Add helix task timeout Key: GOBBLIN-476 URL: https://issues.apache.org/jira/browse/GOBBLIN-476 Project: Apache Gobblin Issue Type: Improvement Reporter: Kuai Yu -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GOBBLIN-473) Allow user to configure different lookback time for different datasets
Kuai Yu created GOBBLIN-473: --- Summary: Allow user to configure different lookback time for different datasets Key: GOBBLIN-473 URL: https://issues.apache.org/jira/browse/GOBBLIN-473 Project: Apache Gobblin Issue Type: Improvement Reporter: Kuai Yu Assignee: Kuai Yu -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GOBBLIN-466) Reuse same connector for Salesforce dynamic partitioning
Kuai Yu created GOBBLIN-466: --- Summary: Reuse same connector for Salesforce dynamic partitioning Key: GOBBLIN-466 URL: https://issues.apache.org/jira/browse/GOBBLIN-466 Project: Apache Gobblin Issue Type: Improvement Reporter: Kuai Yu Assignee: Kuai Yu We add getConnector method in salesforce source class to allow: 1) Any derived class overwrite this method. 2) Always use same connector to get watermark metadata -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GOBBLIN-448) Add glob pattern blacklist in ConfigurableGlobDatasetFinder
Kuai Yu created GOBBLIN-448: --- Summary: Add glob pattern blacklist in ConfigurableGlobDatasetFinder Key: GOBBLIN-448 URL: https://issues.apache.org/jira/browse/GOBBLIN-448 Project: Apache Gobblin Issue Type: Improvement Reporter: Kuai Yu Assignee: Kuai Yu -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GOBBLIN-445) Add task output directory for staging compaction result
Kuai Yu created GOBBLIN-445: --- Summary: Add task output directory for staging compaction result Key: GOBBLIN-445 URL: https://issues.apache.org/jira/browse/GOBBLIN-445 Project: Apache Gobblin Issue Type: Improvement Reporter: Kuai Yu Assignee: Kuai Yu -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GOBBLIN-436) Salesforce doesn't have default constructor
Kuai Yu created GOBBLIN-436: --- Summary: Salesforce doesn't have default constructor Key: GOBBLIN-436 URL: https://issues.apache.org/jira/browse/GOBBLIN-436 Project: Apache Gobblin Issue Type: Improvement Reporter: Kuai Yu Assignee: Kuai Yu -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GOBBLIN-423) Limit records or bucket counts for dynamic probing
Kuai Yu created GOBBLIN-423: --- Summary: Limit records or bucket counts for dynamic probing Key: GOBBLIN-423 URL: https://issues.apache.org/jira/browse/GOBBLIN-423 Project: Apache Gobblin Issue Type: Improvement Reporter: Kuai Yu Assignee: Kuai Yu -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GOBBLIN-419) Add more metrics for cluster job scheduling
Kuai Yu created GOBBLIN-419: --- Summary: Add more metrics for cluster job scheduling Key: GOBBLIN-419 URL: https://issues.apache.org/jira/browse/GOBBLIN-419 Project: Apache Gobblin Issue Type: Improvement Reporter: Kuai Yu Assignee: Kuai Yu -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GOBBLIN-416) Allow user to configure java options to launch child process for cluster task isolation
Kuai Yu created GOBBLIN-416: --- Summary: Allow user to configure java options to launch child process for cluster task isolation Key: GOBBLIN-416 URL: https://issues.apache.org/jira/browse/GOBBLIN-416 Project: Apache Gobblin Issue Type: Improvement Reporter: Kuai Yu -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GOBBLIN-403) Fix the NPE issue due to uninitialized kafkajobmonitor metrics
Kuai Yu created GOBBLIN-403: --- Summary: Fix the NPE issue due to uninitialized kafkajobmonitor metrics Key: GOBBLIN-403 URL: https://issues.apache.org/jira/browse/GOBBLIN-403 Project: Apache Gobblin Issue Type: Bug Reporter: Kuai Yu -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GOBBLIN-378) Task only publish data when the state is successful in the earlier processing
Kuai Yu created GOBBLIN-378: --- Summary: Task only publish data when the state is successful in the earlier processing Key: GOBBLIN-378 URL: https://issues.apache.org/jira/browse/GOBBLIN-378 Project: Apache Gobblin Issue Type: Bug Reporter: Kuai Yu Assignee: Kuai Yu -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (GOBBLIN-373) Expose task executor auto scale metrics to external sensor
[ https://issues.apache.org/jira/browse/GOBBLIN-373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16327617#comment-16327617 ] Kuai Yu commented on GOBBLIN-373: - [~jbaranick], we are using *StandardMetricsBridge* interface to expose these metrics (this PR is just for exposing these metrics only). Inside LinkedIn, we have another internal project to convert these metrics to another type of object called Sensor, which is used by LinkedIn to show metrics on many dashboards. > Expose task executor auto scale metrics to external sensor > -- > > Key: GOBBLIN-373 > URL: https://issues.apache.org/jira/browse/GOBBLIN-373 > Project: Apache Gobblin > Issue Type: Task >Reporter: Kuai Yu >Assignee: Kuai Yu >Priority: Major > > This is used for LinkedIn inGraph integration. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GOBBLIN-373) Expose task executor auto scale metrics to external sensor
Kuai Yu created GOBBLIN-373: --- Summary: Expose task executor auto scale metrics to external sensor Key: GOBBLIN-373 URL: https://issues.apache.org/jira/browse/GOBBLIN-373 Project: Apache Gobblin Issue Type: Task Reporter: Kuai Yu Assignee: Kuai Yu This is used for LinkedIn inGraph integration. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GOBBLIN-358) Add logs for GobblinMetrics
Kuai Yu created GOBBLIN-358: --- Summary: Add logs for GobblinMetrics Key: GOBBLIN-358 URL: https://issues.apache.org/jira/browse/GOBBLIN-358 Project: Apache Gobblin Issue Type: Bug Reporter: Kuai Yu -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (GOBBLIN-356) hanging when retrieving kafka schema
Kuai Yu created GOBBLIN-356: --- Summary: hanging when retrieving kafka schema Key: GOBBLIN-356 URL: https://issues.apache.org/jira/browse/GOBBLIN-356 Project: Apache Gobblin Issue Type: Bug Reporter: Kuai Yu -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (GOBBLIN-349) Add guages for gobblin cluster metrics
Kuai Yu created GOBBLIN-349: --- Summary: Add guages for gobblin cluster metrics Key: GOBBLIN-349 URL: https://issues.apache.org/jira/browse/GOBBLIN-349 Project: Apache Gobblin Issue Type: Improvement Reporter: Kuai Yu Although we already have a counter metrics, but we still add a gauge metrics for completeness because internally LinkedIn will use healthcheck sensor to process the metrics, the counter will be treated as a rate instead of a real time number. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (GOBBLIN-326) Gobblin metrics constructor only provides default constructor for Codhale metrics
Kuai Yu created GOBBLIN-326: --- Summary: Gobblin metrics constructor only provides default constructor for Codhale metrics Key: GOBBLIN-326 URL: https://issues.apache.org/jira/browse/GOBBLIN-326 Project: Apache Gobblin Issue Type: Improvement Reporter: Kuai Yu Assignee: Kuai Yu -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (GOBBLIN-320) Add metrics to GobblinHelixJobScheduler
Kuai Yu created GOBBLIN-320: --- Summary: Add metrics to GobblinHelixJobScheduler Key: GOBBLIN-320 URL: https://issues.apache.org/jira/browse/GOBBLIN-320 Project: Apache Gobblin Issue Type: Improvement Reporter: Kuai Yu Assignee: Kuai Yu -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (GOBBLIN-308) Gobblin cluster bootup hangs
Kuai Yu created GOBBLIN-308: --- Summary: Gobblin cluster bootup hangs Key: GOBBLIN-308 URL: https://issues.apache.org/jira/browse/GOBBLIN-308 Project: Apache Gobblin Issue Type: Bug Reporter: Kuai Yu Assignee: Kuai Yu The problem happens when there are more than 100 files in the job catalog. During the boot up sequence, spec consumer was launched after jobCatalog. However the jobCatalog launches with a job listener which will push job spec into a blocking queue, and due to spec consumer hasn't been started, no component will start to consume job specs from the blocking queue. Once the blocking queue max size (100 by default) is reached, the system is hanging. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (GOBBLIN-303) Compaction can generate zero sized output when MR is in speculative mode
Kuai Yu created GOBBLIN-303: --- Summary: Compaction can generate zero sized output when MR is in speculative mode Key: GOBBLIN-303 URL: https://issues.apache.org/jira/browse/GOBBLIN-303 Project: Apache Gobblin Issue Type: Bug Reporter: Kuai Yu Assignee: Kuai Yu Priority: Minor Currently if MR job used speculative mode, it was very likely that output has a zero sized file generated by a killed task attempt. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (GOBBLIN-277) Add a lock to make multihop thread safe
Kuai Yu created GOBBLIN-277: --- Summary: Add a lock to make multihop thread safe Key: GOBBLIN-277 URL: https://issues.apache.org/jira/browse/GOBBLIN-277 Project: Apache Gobblin Issue Type: Bug Reporter: Kuai Yu Assignee: Kuai Yu -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (GOBBLIN-268) Unique job uri and job name generation for GaaS
Kuai Yu created GOBBLIN-268: --- Summary: Unique job uri and job name generation for GaaS Key: GOBBLIN-268 URL: https://issues.apache.org/jira/browse/GOBBLIN-268 Project: Apache Gobblin Issue Type: Bug Reporter: Kuai Yu Assignee: Kuai Yu -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (GOBBLIN-252) Add some azkaban related constants
Kuai Yu created GOBBLIN-252: --- Summary: Add some azkaban related constants Key: GOBBLIN-252 URL: https://issues.apache.org/jira/browse/GOBBLIN-252 Project: Apache Gobblin Issue Type: Improvement Reporter: Kuai Yu Assignee: Kuai Yu -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (GOBBLIN-241) Allow multiple datasets send different lineage event for kafka
[ https://issues.apache.org/jira/browse/GOBBLIN-241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kuai Yu updated GOBBLIN-241: Summary: Allow multiple datasets send different lineage event for kafka (was: Add task level lineage submission for kafka lineage event support) > Allow multiple datasets send different lineage event for kafka > -- > > Key: GOBBLIN-241 > URL: https://issues.apache.org/jira/browse/GOBBLIN-241 > Project: Apache Gobblin > Issue Type: Bug >Reporter: Kuai Yu >Assignee: Kuai Yu > > This task is mainly to add or refactor existing lineage events support. Allow > task level publisher to submit lineage event. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (GOBBLIN-244) Need additional info for gobblin tracking hourly-deduped
Kuai Yu created GOBBLIN-244: --- Summary: Need additional info for gobblin tracking hourly-deduped Key: GOBBLIN-244 URL: https://issues.apache.org/jira/browse/GOBBLIN-244 Project: Apache Gobblin Issue Type: Bug Reporter: Kuai Yu Assignee: Kuai Yu Add the previous record count and the number of execution runs. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (GOBBLIN-241) Add task level lineage submission for kafka lineage event support
Kuai Yu created GOBBLIN-241: --- Summary: Add task level lineage submission for kafka lineage event support Key: GOBBLIN-241 URL: https://issues.apache.org/jira/browse/GOBBLIN-241 Project: Apache Gobblin Issue Type: Bug Reporter: Kuai Yu Assignee: Kuai Yu This task is mainly to add or refactor existing lineage events support. Allow task level publisher to submit lineage event. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (GOBBLIN-235) Prevent log warnings when TaskStateCollectorService has no task states detected
Kuai Yu created GOBBLIN-235: --- Summary: Prevent log warnings when TaskStateCollectorService has no task states detected Key: GOBBLIN-235 URL: https://issues.apache.org/jira/browse/GOBBLIN-235 Project: Apache Gobblin Issue Type: Bug Reporter: Kuai Yu Assignee: Kuai Yu Need to adjust log level from warning to debug -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (GOBBLIN-233) Add concurrent map to avoid multiple job submission from GobblinHelixJobScheduler
Kuai Yu created GOBBLIN-233: --- Summary: Add concurrent map to avoid multiple job submission from GobblinHelixJobScheduler Key: GOBBLIN-233 URL: https://issues.apache.org/jira/browse/GOBBLIN-233 Project: Apache Gobblin Issue Type: Bug Reporter: Kuai Yu Assignee: Kuai Yu Current helix job scheduler doesn't examine if any existing job of same type is running in the queue. Need some lock similar protection to avoid multiple job submission to reduce the workload of gobblin and helix. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (GOBBLIN-214) Filtering doesn't work in FileListUtils:listFilesRecursively
[ https://issues.apache.org/jira/browse/GOBBLIN-214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16133645#comment-16133645 ] Kuai Yu commented on GOBBLIN-214: - The problem came up when we try to use this method to filter out all AVRO files when a base directory was passed in as an argument. With previous logic, the filter only applies to the directories, instead of files. So if we have baseDir/_schema.avsc file present, it won't be able to skip. > Filtering doesn't work in FileListUtils:listFilesRecursively > > > Key: GOBBLIN-214 > URL: https://issues.apache.org/jira/browse/GOBBLIN-214 > Project: Apache Gobblin > Issue Type: Bug >Reporter: Kuai Yu >Assignee: Kuai Yu > > The filtering logic for FileListUtils:listFilesRecursively was wrong. It > never applies the filtering to the files that is non-directory type -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (GOBBLIN-214) Filtering doesn't work in FileListUtils:listFilesRecursively
[ https://issues.apache.org/jira/browse/GOBBLIN-214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kuai Yu updated GOBBLIN-214: Description: The filtering logic for FileListUtils:listFilesRecursively was wrong. It never applies the filtering to the files that is non-directory type (was: The filtering logic for FileListUtils:listFilesRecursively was wrong.) > Filtering doesn't work in FileListUtils:listFilesRecursively > > > Key: GOBBLIN-214 > URL: https://issues.apache.org/jira/browse/GOBBLIN-214 > Project: Apache Gobblin > Issue Type: Bug >Reporter: Kuai Yu >Assignee: Kuai Yu > > The filtering logic for FileListUtils:listFilesRecursively was wrong. It > never applies the filtering to the files that is non-directory type -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (GOBBLIN-214) Filtering doesn't work in FileListUtils:listFilesRecursively
Kuai Yu created GOBBLIN-214: --- Summary: Filtering doesn't work in FileListUtils:listFilesRecursively Key: GOBBLIN-214 URL: https://issues.apache.org/jira/browse/GOBBLIN-214 Project: Apache Gobblin Issue Type: Bug Reporter: Kuai Yu Assignee: Kuai Yu The filtering logic for FileListUtils:listFilesRecursively was wrong. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Closed] (GOBBLIN-38) Create workunitstream for CompactionSource
[ https://issues.apache.org/jira/browse/GOBBLIN-38?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kuai Yu closed GOBBLIN-38. -- Resolution: Fixed This is a duplicate PR. We already have a workunit stream for CompactionSource. Close this one > Create workunitstream for CompactionSource > -- > > Key: GOBBLIN-38 > URL: https://issues.apache.org/jira/browse/GOBBLIN-38 > Project: Apache Gobblin > Issue Type: Task >Reporter: Kuai Yu > > *Github Url* : https://github.com/linkedin/gobblin/pull/1826 > *Github Reporter* : [~yukuai518] > *Github Created At* : 2017-05-02T22:54:52Z > *Github Updated At* : 2017-06-13T15:45:12Z > h3. Comments > > [~ibuenros] wrote on 2017-06-13T15:45:12Z : @yukuai518 what is the status of > this PR? > > *Github Url* : > https://github.com/linkedin/gobblin/pull/1826#issuecomment-308160180 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (GOBBLIN-19) dataset specific properties are ignored by KafkaBiLevelWorkUnitPacker
[ https://issues.apache.org/jira/browse/GOBBLIN-19?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kuai Yu reassigned GOBBLIN-19: -- Assignee: Kuai Yu Sprint: Apache Gobblin 170807 > dataset specific properties are ignored by KafkaBiLevelWorkUnitPacker > - > > Key: GOBBLIN-19 > URL: https://issues.apache.org/jira/browse/GOBBLIN-19 > Project: Apache Gobblin > Issue Type: Bug >Reporter: Clemens Valiente >Assignee: Kuai Yu > > I failed to get dataset.specific.props to work on our jobs, and I think I > found the reason: > in KafkaSource.getWorkUnitForTopicPartition the properties are added > correctly to the individual workunits. > The KafkaBiLevelWorkUnitPacker then assigns the WorkUnits to their bins and > combines them into one WorkUnit in squeezeMultiWorkUnit() but doesn't copy > over the topicSpecificSettings. > Using the KafkaSingleLevelWorkUnitPacker works fine with > dataset.specific.props since it doesn't call squeezeMultiWorkUnit on > non-empty workUnits. > > *Github Url* : https://github.com/linkedin/gobblin/issues/1901 > *Github Reporter* : [~cvaliente] > *Github Created At* : 2017-05-26T09:25:54Z > *Github Updated At* : 2017-05-31T06:39:04Z > h3. Comments > > [~cvaliente] wrote on 2017-05-26T10:55:37Z : fix in #1903 > > *Github Url* : > https://github.com/linkedin/gobblin/issues/1901#issuecomment-304253329 > > [~stakiar] wrote on 2017-05-30T17:42:07Z : Doesn't > `KafkaSource#addTopicSpecificPropsToWorkUnits` handle adding dataset specific > configuration? That method is run after the bin-packing is done. So if > `dataset.specific.props` isn't working I would guess the bug would be in that > method. > > *Github Url* : > https://github.com/linkedin/gobblin/issues/1901#issuecomment-304953996 > > [~cvaliente] wrote on 2017-05-31T06:39:04Z : You are right, that wasn't yet > implemented in 0.9 and I forgot to check upstream. > > *Github Url* : > https://github.com/linkedin/gobblin/issues/1901#issuecomment-305098396 -- This message was sent by Atlassian JIRA (v6.4.14#64029)