[jira] [Resolved] (GOBBLIN-277) Add a lock to make multihop thread safe

2020-07-24 Thread Kuai Yu (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuai Yu resolved GOBBLIN-277.
-
Resolution: Fixed

> Add a lock to make multihop thread safe
> ---
>
> Key: GOBBLIN-277
> URL: https://issues.apache.org/jira/browse/GOBBLIN-277
> Project: Apache Gobblin
>  Issue Type: Bug
>Reporter: Kuai Yu
>Assignee: Kuai Yu
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (GOBBLIN-373) Expose task executor auto scale metrics to external sensor

2020-07-24 Thread Kuai Yu (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuai Yu resolved GOBBLIN-373.
-
Resolution: Fixed

> Expose task executor auto scale metrics to external sensor
> --
>
> Key: GOBBLIN-373
> URL: https://issues.apache.org/jira/browse/GOBBLIN-373
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Kuai Yu
>Assignee: Kuai Yu
>Priority: Major
>
> This is used for LinkedIn inGraph integration.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (GOBBLIN-1134) Improve FAQs

2020-04-29 Thread Kuai Yu (Jira)
Kuai Yu created GOBBLIN-1134:


 Summary: Improve FAQs
 Key: GOBBLIN-1134
 URL: https://issues.apache.org/jira/browse/GOBBLIN-1134
 Project: Apache Gobblin
  Issue Type: Improvement
Reporter: Kuai Yu


Add more details if user hits NPE with below error message
h5. Gradle Build Fails With {{Cannot invoke method getURLs on null object}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (GOBBLIN-1129) clean up staging table created by avro2orc pipeline

2020-04-27 Thread Kuai Yu (Jira)
Kuai Yu created GOBBLIN-1129:


 Summary: clean up staging table created by avro2orc pipeline
 Key: GOBBLIN-1129
 URL: https://issues.apache.org/jira/browse/GOBBLIN-1129
 Project: Apache Gobblin
  Issue Type: Improvement
Reporter: Kuai Yu


During the avro2orc conversion, many staging tables are created, they are not 
cleaned up due to the pipeline failure, and next execution doesn't take care of 
the clean up, which caused the staging table taking many spaces in hive 
metastore.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (GOBBLIN-1124) Error message should include throwable details when http converter hits a failure

2020-04-22 Thread Kuai Yu (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-1124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuai Yu resolved GOBBLIN-1124.
--
Resolution: Fixed

> Error message should include throwable details when http converter hits a 
> failure
> -
>
> Key: GOBBLIN-1124
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1124
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Kuai Yu
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (GOBBLIN-1124) Error message should include throwable details when http converter hits a failure

2020-04-21 Thread Kuai Yu (Jira)
Kuai Yu created GOBBLIN-1124:


 Summary: Error message should include throwable details when http 
converter hits a failure
 Key: GOBBLIN-1124
 URL: https://issues.apache.org/jira/browse/GOBBLIN-1124
 Project: Apache Gobblin
  Issue Type: Improvement
Reporter: Kuai Yu






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (GOBBLIN-1049) Move workunit commit logic to the end of publish().

2020-02-13 Thread Kuai Yu (Jira)
Kuai Yu created GOBBLIN-1049:


 Summary: Move workunit commit logic to the end of publish().
 Key: GOBBLIN-1049
 URL: https://issues.apache.org/jira/browse/GOBBLIN-1049
 Project: Apache Gobblin
  Issue Type: Improvement
Reporter: Kuai Yu


We should not blindly commit workunit in the BaseDataPublisher.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (GOBBLIN-962) Refactor RecursiveCopyableDataset so that the copy entities generation logic can be reused.

2019-11-14 Thread Kuai Yu (Jira)
Kuai Yu created GOBBLIN-962:
---

 Summary: Refactor RecursiveCopyableDataset so that the copy 
entities generation logic can be reused.
 Key: GOBBLIN-962
 URL: https://issues.apache.org/jira/browse/GOBBLIN-962
 Project: Apache Gobblin
  Issue Type: Improvement
Reporter: Kuai Yu


Refactor RecursiveCopyableDataset so that the copy entities generation logic 
can be reused.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (GOBBLIN-915) Extract cannot parse the timezone

2019-10-17 Thread Kuai Yu (Jira)
Kuai Yu created GOBBLIN-915:
---

 Summary: Extract cannot parse the timezone
 Key: GOBBLIN-915
 URL: https://issues.apache.org/jira/browse/GOBBLIN-915
 Project: Apache Gobblin
  Issue Type: Bug
Reporter: Kuai Yu






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (GOBBLIN-889) FileContext doesn't use the correct writer fs uri

2019-09-25 Thread Kuai Yu (Jira)
Kuai Yu created GOBBLIN-889:
---

 Summary: FileContext doesn't use the correct writer fs uri
 Key: GOBBLIN-889
 URL: https://issues.apache.org/jira/browse/GOBBLIN-889
 Project: Apache Gobblin
  Issue Type: Bug
Reporter: Kuai Yu


If we are using the hdfs uri, the current FsDataWriter seems ignore that uri 
during the FileContext creation, it simply creates the FileContext based on the 
Configuration object.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (GOBBLIN-851) Provide capability to disable hive schema registration in partition level

2019-08-12 Thread Kuai Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuai Yu updated GOBBLIN-851:

Description: 
We had problems when table level schema and partition level schema diverges. 
Think about the case when user register two partitions : 2019/08/10, 
2019/08/11, but schema changes in between(S1->S2). Now the table level has 
schema S2, but 2019/08/10 will have schema S1. 

Query on the latest schema will cause the old partition failure.

> Provide capability to disable hive schema registration in partition level
> -
>
> Key: GOBBLIN-851
> URL: https://issues.apache.org/jira/browse/GOBBLIN-851
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Kuai Yu
>Priority: Major
>
> We had problems when table level schema and partition level schema diverges. 
> Think about the case when user register two partitions : 2019/08/10, 
> 2019/08/11, but schema changes in between(S1->S2). Now the table level has 
> schema S2, but 2019/08/10 will have schema S1. 
> Query on the latest schema will cause the old partition failure.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (GOBBLIN-851) Provide capability to disable hive schema registration in partition level

2019-08-12 Thread Kuai Yu (JIRA)
Kuai Yu created GOBBLIN-851:
---

 Summary: Provide capability to disable hive schema registration in 
partition level
 Key: GOBBLIN-851
 URL: https://issues.apache.org/jira/browse/GOBBLIN-851
 Project: Apache Gobblin
  Issue Type: Improvement
Reporter: Kuai Yu






--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (GOBBLIN-813) Make SFDC connector support encrypted Salesforce client id and client secret

2019-06-24 Thread Kuai Yu (JIRA)
Kuai Yu created GOBBLIN-813:
---

 Summary: Make SFDC connector support encrypted Salesforce client 
id and client secret
 Key: GOBBLIN-813
 URL: https://issues.apache.org/jira/browse/GOBBLIN-813
 Project: Apache Gobblin
  Issue Type: Improvement
Reporter: Kuai Yu






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GOBBLIN-783) Fix the double referencing issue for job type config

2019-05-24 Thread Kuai Yu (JIRA)
Kuai Yu created GOBBLIN-783:
---

 Summary: Fix the double referencing issue for job type config
 Key: GOBBLIN-783
 URL: https://issues.apache.org/jira/browse/GOBBLIN-783
 Project: Apache Gobblin
  Issue Type: Bug
Reporter: Kuai Yu






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GOBBLIN-751) Make enforced file size matching to be configurable

2019-04-23 Thread Kuai Yu (JIRA)
Kuai Yu created GOBBLIN-751:
---

 Summary: Make enforced file size matching to be configurable
 Key: GOBBLIN-751
 URL: https://issues.apache.org/jira/browse/GOBBLIN-751
 Project: Apache Gobblin
  Issue Type: Improvement
Reporter: Kuai Yu


Make enforced file size matching to be configurable



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GOBBLIN-729) Add version strategy support for HiveDataset copy

2019-04-08 Thread Kuai Yu (JIRA)
Kuai Yu created GOBBLIN-729:
---

 Summary: Add version strategy support for HiveDataset copy
 Key: GOBBLIN-729
 URL: https://issues.apache.org/jira/browse/GOBBLIN-729
 Project: Apache Gobblin
  Issue Type: Improvement
Reporter: Kuai Yu


This PR will add data strategy support for Hive dataset copy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GOBBLIN-713) Lazy load job specification from job catalog to avoid OOM issue when JobCatalog is bootup.

2019-03-26 Thread Kuai Yu (JIRA)
Kuai Yu created GOBBLIN-713:
---

 Summary: Lazy load job specification from job catalog to avoid OOM 
issue when JobCatalog is bootup.
 Key: GOBBLIN-713
 URL: https://issues.apache.org/jira/browse/GOBBLIN-713
 Project: Apache Gobblin
  Issue Type: Improvement
Reporter: Kuai Yu


Today whenever the job catalog is restarted, all the job specs are load into 
memory. This can cause OOM issue in our production load. Ticket was created to 
provide an easy way to load job spec without materializing all the job specs 
into memory. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GOBBLIN-712) Add version strategy for configbased dataset copy

2019-03-25 Thread Kuai Yu (JIRA)
Kuai Yu created GOBBLIN-712:
---

 Summary: Add version strategy for configbased dataset copy
 Key: GOBBLIN-712
 URL: https://issues.apache.org/jira/browse/GOBBLIN-712
 Project: Apache Gobblin
  Issue Type: Improvement
Reporter: Kuai Yu






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GOBBLIN-703) Allow planning job to be run in a non-blocking way

2019-03-19 Thread Kuai Yu (JIRA)
Kuai Yu created GOBBLIN-703:
---

 Summary: Allow planning job to be run in a non-blocking way
 Key: GOBBLIN-703
 URL: https://issues.apache.org/jira/browse/GOBBLIN-703
 Project: Apache Gobblin
  Issue Type: Improvement
Reporter: Kuai Yu


Today all the planning job will be running in a dedicated thread pool and will 
wait until the full execution to be completed. This requires a lot of system 
resources and a dedicated monitoring thread. The improvement here is to reduce 
the waiting time on a dedicated monitoring thread. Basically once the planning 
job submits to the Helix, we don't need to wait on the job completion. The job 
status monitoring will be achieved by GaaS monitoring. 

By doing this, we are freeing most of the threadpool resources because each 
monitoring thread will be immediately return after it finishes the job 
submission.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GOBBLIN-690) Relaunch check for the planning job is not correct

2019-02-25 Thread Kuai Yu (JIRA)
Kuai Yu created GOBBLIN-690:
---

 Summary: Relaunch check for the planning job is not correct
 Key: GOBBLIN-690
 URL: https://issues.apache.org/jira/browse/GOBBLIN-690
 Project: Apache Gobblin
  Issue Type: Improvement
Reporter: Kuai Yu






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GOBBLIN-683) Azkaban client should retry if session gets expired

2019-02-14 Thread Kuai Yu (JIRA)
Kuai Yu created GOBBLIN-683:
---

 Summary: Azkaban client should retry if session gets expired
 Key: GOBBLIN-683
 URL: https://issues.apache.org/jira/browse/GOBBLIN-683
 Project: Apache Gobblin
  Issue Type: Improvement
Reporter: Kuai Yu
Assignee: Kuai Yu






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GOBBLIN-664) AzkabanClient session refresh logic is not configurable

2019-01-11 Thread Kuai Yu (JIRA)
Kuai Yu created GOBBLIN-664:
---

 Summary: AzkabanClient session refresh logic is not configurable
 Key: GOBBLIN-664
 URL: https://issues.apache.org/jira/browse/GOBBLIN-664
 Project: Apache Gobblin
  Issue Type: Improvement
Reporter: Kuai Yu
Assignee: Kuai Yu






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GOBBLIN-661) Prevent jobs resubmission after manager failure

2019-01-08 Thread Kuai Yu (JIRA)
Kuai Yu created GOBBLIN-661:
---

 Summary: Prevent jobs resubmission after manager failure
 Key: GOBBLIN-661
 URL: https://issues.apache.org/jira/browse/GOBBLIN-661
 Project: Apache Gobblin
  Issue Type: Improvement
Reporter: Kuai Yu
Assignee: Kuai Yu


In gobblin cluster, if manager failed and relaunched, all the jobs persisted in 
the job catalog will be relaunched. This can cause a few issues:

1) Scalability issue: because the unfinished job might be submitted at 
different point of time, now if all of them are submitted at the same time, it 
can cause a performance issue.

2) Waste effort: because the unfinished job now needs to be deleted, we have to 
kill the existing running job, and resubmit.

 

In this change, we improve both 1) and 2)

1) In taskdriver mode, we will delete the job spec once we submit to Helix, 
because we believe Helix is durable and all the jobs submitted wont' be lost, 
so that we can safely delete the job specs. Next reboot manager won't see those 
deleted job spec, thus no resubmission is needed. 

2) In taskdriver mode, we will cleanup Helix running jobs. If it is a planning 
job, we won't delete it. Instead we just let it run to the end.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (GOBBLIN-655) Allow helix jobs to have job type set

2018-12-13 Thread Kuai Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuai Yu updated GOBBLIN-655:

Description: This is required because Helix will use job type for the quota 
assignment. For job prioritization, we need to have different quota for 
different priority jobs.

> Allow helix jobs to have job type set
> -
>
> Key: GOBBLIN-655
> URL: https://issues.apache.org/jira/browse/GOBBLIN-655
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Kuai Yu
>Assignee: Kuai Yu
>Priority: Major
>
> This is required because Helix will use job type for the quota assignment. 
> For job prioritization, we need to have different quota for different 
> priority jobs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GOBBLIN-655) Allow helix jobs to have job type set

2018-12-13 Thread Kuai Yu (JIRA)
Kuai Yu created GOBBLIN-655:
---

 Summary: Allow helix jobs to have job type set
 Key: GOBBLIN-655
 URL: https://issues.apache.org/jira/browse/GOBBLIN-655
 Project: Apache Gobblin
  Issue Type: Improvement
Reporter: Kuai Yu
Assignee: Kuai Yu






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GOBBLIN-652) gobblin cluster doesn't have helix related metrics

2018-12-11 Thread Kuai Yu (JIRA)
Kuai Yu created GOBBLIN-652:
---

 Summary: gobblin cluster doesn't have helix related metrics
 Key: GOBBLIN-652
 URL: https://issues.apache.org/jira/browse/GOBBLIN-652
 Project: Apache Gobblin
  Issue Type: Improvement
Reporter: Kuai Yu
Assignee: Kuai Yu






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GOBBLIN-649) Add task driver cluster

2018-12-10 Thread Kuai Yu (JIRA)
Kuai Yu created GOBBLIN-649:
---

 Summary: Add task driver cluster
 Key: GOBBLIN-649
 URL: https://issues.apache.org/jira/browse/GOBBLIN-649
 Project: Apache Gobblin
  Issue Type: Improvement
Reporter: Kuai Yu
Assignee: Kuai Yu






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GOBBLIN-647) Move early stop feature to task driver

2018-12-06 Thread Kuai Yu (JIRA)
Kuai Yu created GOBBLIN-647:
---

 Summary: Move early stop feature to task driver
 Key: GOBBLIN-647
 URL: https://issues.apache.org/jira/browse/GOBBLIN-647
 Project: Apache Gobblin
  Issue Type: Improvement
Reporter: Kuai Yu
Assignee: Kuai Yu






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GOBBLIN-639) Change serder method to static from RequesterService

2018-11-27 Thread Kuai Yu (JIRA)
Kuai Yu created GOBBLIN-639:
---

 Summary: Change serder method to static from RequesterService
 Key: GOBBLIN-639
 URL: https://issues.apache.org/jira/browse/GOBBLIN-639
 Project: Apache Gobblin
  Issue Type: Improvement
Reporter: Kuai Yu
Assignee: Kuai Yu






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GOBBLIN-626) Fix the wrong planning job tag key

2018-11-05 Thread Kuai Yu (JIRA)
Kuai Yu created GOBBLIN-626:
---

 Summary: Fix the wrong planning job tag key
 Key: GOBBLIN-626
 URL: https://issues.apache.org/jira/browse/GOBBLIN-626
 Project: Apache Gobblin
  Issue Type: Bug
Reporter: Kuai Yu
Assignee: Kuai Yu






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GOBBLIN-625) Distributed job launcher doesn't have helix tagging support

2018-11-05 Thread Kuai Yu (JIRA)
Kuai Yu created GOBBLIN-625:
---

 Summary: Distributed job launcher doesn't have helix tagging 
support
 Key: GOBBLIN-625
 URL: https://issues.apache.org/jira/browse/GOBBLIN-625
 Project: Apache Gobblin
  Issue Type: Improvement
Reporter: Kuai Yu
Assignee: Kuai Yu


This ticket is created to add Helix tagging support for distributed job 
launcher.

In distributed job launcher, when a planning job was sent out, it can be tagged 
with "gobblin.cluster.helixPlanningJobTag". This differentiates from 
"gobblin.cluster.helixJobTag" because latter is used for the actual job instead 
of a planning job.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GOBBLIN-622) Avoid to serialize all previous workunits in SourceState to save both memory and diskspace

2018-10-26 Thread Kuai Yu (JIRA)
Kuai Yu created GOBBLIN-622:
---

 Summary: Avoid to serialize all previous workunits in SourceState 
to save both memory and diskspace
 Key: GOBBLIN-622
 URL: https://issues.apache.org/jira/browse/GOBBLIN-622
 Project: Apache Gobblin
  Issue Type: Improvement
Reporter: Kuai Yu






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GOBBLIN-620) NPE when catalog metrics is not enabled

2018-10-25 Thread Kuai Yu (JIRA)
Kuai Yu created GOBBLIN-620:
---

 Summary: NPE when catalog metrics is not enabled
 Key: GOBBLIN-620
 URL: https://issues.apache.org/jira/browse/GOBBLIN-620
 Project: Apache Gobblin
  Issue Type: Improvement
Reporter: Kuai Yu
Assignee: Kuai Yu


When getStandardMetrics is invoked, the metrics is null if the metrics is not 
enabled. This will cause the ImmutableList to contain a null object which 
raised an NPE



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GOBBLIN-619) Fix the metrics name for GobblinHelixJobScheduler

2018-10-24 Thread Kuai Yu (JIRA)
Kuai Yu created GOBBLIN-619:
---

 Summary: Fix the metrics name for GobblinHelixJobScheduler
 Key: GOBBLIN-619
 URL: https://issues.apache.org/jira/browse/GOBBLIN-619
 Project: Apache Gobblin
  Issue Type: Improvement
Reporter: Kuai Yu
Assignee: Kuai Yu






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GOBBLIN-618) Remove unnecessary methods from StandardMetricsBridge

2018-10-23 Thread Kuai Yu (JIRA)
Kuai Yu created GOBBLIN-618:
---

 Summary: Remove unnecessary methods from StandardMetricsBridge
 Key: GOBBLIN-618
 URL: https://issues.apache.org/jira/browse/GOBBLIN-618
 Project: Apache Gobblin
  Issue Type: Improvement
Reporter: Kuai Yu
Assignee: Kuai Yu






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GOBBLIN-617) Add distributed job launcher metrics and some refactoring.

2018-10-23 Thread Kuai Yu (JIRA)
Kuai Yu created GOBBLIN-617:
---

 Summary: Add distributed job launcher metrics and some refactoring.
 Key: GOBBLIN-617
 URL: https://issues.apache.org/jira/browse/GOBBLIN-617
 Project: Apache Gobblin
  Issue Type: Improvement
Reporter: Kuai Yu
Assignee: Kuai Yu


Add metrics for GobblinHelixJobTask.
 Refactored metrics for GobblinHelixJobScheduler
 Refactored metrics for GobblinHelixJobLauncher



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GOBBLIN-615) Make LWM==HWM a valid interval in QueryBaseSource

2018-10-19 Thread Kuai Yu (JIRA)
Kuai Yu created GOBBLIN-615:
---

 Summary: Make LWM==HWM a valid interval in QueryBaseSource
 Key: GOBBLIN-615
 URL: https://issues.apache.org/jira/browse/GOBBLIN-615
 Project: Apache Gobblin
  Issue Type: Improvement
Reporter: Kuai Yu
Assignee: Kuai Yu


We have seen many issues in DateWatermark where the job intermittently failed 
every other day. The reason is as follows:
 # On 10-02 at 17:47 job pulls with logindate >= 2018-10-01 (HWM = 10-2, when 
job finished Actual_HWM is 10/2)
 # On same 10-02 date, if the job repulled, we would have LWM=10-3, HWM=10-2, 
the job would fail as expected.
 # On 10-03 at 17:47 job fails to generate any workunits because now LWM = 
Actual_HWM + 1 = 10-3, HWM = 10-3. According to DateWatermark::getIntervals(), 
the startTime must be less than endTime to generate an interval.
 # On 10-04 at 17:47 job recovered because LWM keeps as 10-3 and HWM = 10-4, so 
a valid interval is generated again.

The fix here is to let DateWatermark generate an interval at step 3, so that we 
won't have an intermittent failure in step 3.

However this fix will cause another problem. Today we could have missing data 
in step 1 and 4, because step 1 pulls data for 10/2 too early and step 4 pulls 
data for 10/4 too early, but at least step 3 pulls whole data for 10/3. After 
this fix, the 10/3 will be pulled too early as well. So that this fix needs to 
be working with Cutoff feature so that we will only pull 10-1's data on 10/2.

Thanks,

Kuai



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GOBBLIN-608) Allow user to configure the fork operation tiimeout

2018-10-09 Thread Kuai Yu (JIRA)
Kuai Yu created GOBBLIN-608:
---

 Summary: Allow user to configure the fork operation tiimeout
 Key: GOBBLIN-608
 URL: https://issues.apache.org/jira/browse/GOBBLIN-608
 Project: Apache Gobblin
  Issue Type: Improvement
Reporter: Kuai Yu
Assignee: Kuai Yu


Allow fork operation to be configured with max waiting time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GOBBLIN-602) Allow AzkabanProducer to be customized

2018-10-05 Thread Kuai Yu (JIRA)
Kuai Yu created GOBBLIN-602:
---

 Summary: Allow AzkabanProducer to be customized
 Key: GOBBLIN-602
 URL: https://issues.apache.org/jira/browse/GOBBLIN-602
 Project: Apache Gobblin
  Issue Type: Improvement
Reporter: Kuai Yu
Assignee: Kuai Yu


Use java reflection to construct the AzkabanProducer class instead of always 
assuming it's an AzkabanProducer.class.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GOBBLIN-601) Add cancellation to AzkabanClient

2018-10-03 Thread Kuai Yu (JIRA)
Kuai Yu created GOBBLIN-601:
---

 Summary: Add cancellation to AzkabanClient
 Key: GOBBLIN-601
 URL: https://issues.apache.org/jira/browse/GOBBLIN-601
 Project: Apache Gobblin
  Issue Type: Improvement
Reporter: Kuai Yu
Assignee: Kuai Yu






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GOBBLIN-591) Allow user to pass in the customized http client for azkaban client

2018-09-17 Thread Kuai Yu (JIRA)
Kuai Yu created GOBBLIN-591:
---

 Summary: Allow user to pass in the customized http client for 
azkaban client
 Key: GOBBLIN-591
 URL: https://issues.apache.org/jira/browse/GOBBLIN-591
 Project: Apache Gobblin
  Issue Type: Improvement
Reporter: Kuai Yu
Assignee: Kuai Yu






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GOBBLIN-588) Remove final decorator to allow password be overwritten

2018-09-13 Thread Kuai Yu (JIRA)
Kuai Yu created GOBBLIN-588:
---

 Summary: Remove final decorator to allow password be overwritten
 Key: GOBBLIN-588
 URL: https://issues.apache.org/jira/browse/GOBBLIN-588
 Project: Apache Gobblin
  Issue Type: Improvement
Reporter: Kuai Yu






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GOBBLIN-584) Fix the helix key configuration naming.

2018-09-11 Thread Kuai Yu (JIRA)
Kuai Yu created GOBBLIN-584:
---

 Summary: Fix the helix key configuration naming.
 Key: GOBBLIN-584
 URL: https://issues.apache.org/jira/browse/GOBBLIN-584
 Project: Apache Gobblin
  Issue Type: Improvement
Reporter: Kuai Yu






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GOBBLIN-575) Remove scala dependencies

2018-09-05 Thread Kuai Yu (JIRA)
Kuai Yu created GOBBLIN-575:
---

 Summary: Remove scala dependencies
 Key: GOBBLIN-575
 URL: https://issues.apache.org/jira/browse/GOBBLIN-575
 Project: Apache Gobblin
  Issue Type: Improvement
Reporter: Kuai Yu


Some of the Scala dependencies were added in build.gradle, which caused some 
scala binary detection tools to fail within LinkedIn. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GOBBLIN-574) elasticsearch-dep module failed to build

2018-09-05 Thread Kuai Yu (JIRA)
Kuai Yu created GOBBLIN-574:
---

 Summary: elasticsearch-dep module failed to build
 Key: GOBBLIN-574
 URL: https://issues.apache.org/jira/browse/GOBBLIN-574
 Project: Apache Gobblin
  Issue Type: Bug
Reporter: Kuai Yu
Assignee: Kuai Yu


> Configure project :gobblin-distribution
Using flavor:standard for project gobblin-distribution
Download 
https://plugins.gradle.org/m2/com/commercehub/gradle/plugin/avro-base/com.commercehub.gradle.plugin.avro-base.gradle.plugin/0.9.0/com.commercehub.gradle.plugin.avro-base.gradle.plugin-0.9.0.pom
Download 
https://plugins.gradle.org/m2/com/commercehub/gradle/plugin/gradle-avro-plugin/0.9.0/gradle-avro-plugin-0.9.0.pom
Download 
https://plugins.gradle.org/m2/com/github/johnrengelman/shadow/com.github.johnrengelman.shadow.gradle.plugin/1.2.4/com.github.johnrengelman.shadow.gradle.plugin-1.2.4.pom

FAILURE: Build failed with an exception.

What went wrong:
A problem occurred configuring project 
':gobblin-modules:gobblin-elasticsearch-deps'.
> Could not get unknown property 'mavenDeployer' for repository container of 
> type org.gradle.api.internal.artifacts.dsl.DefaultRepositoryHandler.

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GOBBLIN-550) When RuntimeException occurred, alwaysDelete flag doesn't work

2018-08-01 Thread Kuai Yu (JIRA)
Kuai Yu created GOBBLIN-550:
---

 Summary: When RuntimeException occurred, alwaysDelete flag doesn't 
work
 Key: GOBBLIN-550
 URL: https://issues.apache.org/jira/browse/GOBBLIN-550
 Project: Apache Gobblin
  Issue Type: Bug
Reporter: Kuai Yu
Assignee: Kuai Yu






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GOBBLIN-546) Use appropriate Lists package

2018-07-30 Thread Kuai Yu (JIRA)
Kuai Yu created GOBBLIN-546:
---

 Summary: Use appropriate Lists package
 Key: GOBBLIN-546
 URL: https://issues.apache.org/jira/browse/GOBBLIN-546
 Project: Apache Gobblin
  Issue Type: Bug
Reporter: Kuai Yu






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GOBBLIN-536) Allow user to configure connection string properties in mysql extractor

2018-07-16 Thread Kuai Yu (JIRA)
Kuai Yu created GOBBLIN-536:
---

 Summary: Allow user to configure connection string properties in 
mysql extractor
 Key: GOBBLIN-536
 URL: https://issues.apache.org/jira/browse/GOBBLIN-536
 Project: Apache Gobblin
  Issue Type: New Feature
Reporter: Kuai Yu






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GOBBLIN-535) Add second hop for distributed job launcher

2018-07-13 Thread Kuai Yu (JIRA)
Kuai Yu created GOBBLIN-535:
---

 Summary: Add second hop for distributed job launcher
 Key: GOBBLIN-535
 URL: https://issues.apache.org/jira/browse/GOBBLIN-535
 Project: Apache Gobblin
  Issue Type: New Feature
Reporter: Kuai Yu
Assignee: Kuai Yu


In previous PR: [https://github.com/apache/incubator-gobblin/pull/2360.] A 
planning job can be distributed to remote node. But remote node is doing NOOP.

In this PR, remote node will do actual GobblinHelixJobLauncher work.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GOBBLIN-532) Always delete jobSpec no matter if the job is successful or not

2018-07-12 Thread Kuai Yu (JIRA)
Kuai Yu created GOBBLIN-532:
---

 Summary: Always delete jobSpec no matter if the job is successful 
or not
 Key: GOBBLIN-532
 URL: https://issues.apache.org/jira/browse/GOBBLIN-532
 Project: Apache Gobblin
  Issue Type: New Feature
Reporter: Kuai Yu
Assignee: Kuai Yu






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GOBBLIN-522) Multiple build issues

2018-06-29 Thread Kuai Yu (JIRA)
Kuai Yu created GOBBLIN-522:
---

 Summary: Multiple build issues
 Key: GOBBLIN-522
 URL: https://issues.apache.org/jira/browse/GOBBLIN-522
 Project: Apache Gobblin
  Issue Type: Bug
Reporter: Kuai Yu
Assignee: Kuai Yu






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (GOBBLIN-490) Add planning job execution launcher

2018-06-13 Thread Kuai Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuai Yu updated GOBBLIN-490:

Description: This new job launcher will forward the original job to one of 
the GobblinTaskRunner(s). Instead of executing the task driver logic on 
GobblinClusterManager, the task driver logic now can be run on 
GobblinTaskRunner.  (was: This new job launcher will submit original job to 
Helix instead of the generated work units. This will allow Helix to 
re-distribute the original jobs to different worker nodes. Each worker then 
process the original jobs and launch its own GobblinHelixJobLauncher. By doing 
this, we will relieve manager node because the task driver logic (mainly 
generate work units) is now distributed to worker nodes.)

> Add planning job execution launcher 
> 
>
> Key: GOBBLIN-490
> URL: https://issues.apache.org/jira/browse/GOBBLIN-490
> Project: Apache Gobblin
>  Issue Type: New Feature
>Reporter: Kuai Yu
>Assignee: Kuai Yu
>Priority: Major
>
> This new job launcher will forward the original job to one of the 
> GobblinTaskRunner(s). Instead of executing the task driver logic on 
> GobblinClusterManager, the task driver logic now can be run on 
> GobblinTaskRunner.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (GOBBLIN-490) Add planning job execution launcher

2018-06-13 Thread Kuai Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuai Yu updated GOBBLIN-490:

Summary: Add planning job execution launcher   (was: Allow job to be 
distributed by Helix)

> Add planning job execution launcher 
> 
>
> Key: GOBBLIN-490
> URL: https://issues.apache.org/jira/browse/GOBBLIN-490
> Project: Apache Gobblin
>  Issue Type: New Feature
>Reporter: Kuai Yu
>Assignee: Kuai Yu
>Priority: Major
>
> This new job launcher will submit original job to Helix instead of the 
> generated work units. This will allow Helix to re-distribute the original 
> jobs to different worker nodes. Each worker then process the original jobs 
> and launch its own GobblinHelixJobLauncher. By doing this, we will relieve 
> manager node because the task driver logic (mainly generate work units) is 
> now distributed to worker nodes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GOBBLIN-510) Decouple JobExecutionLauncher and JobExecutionDriver

2018-06-06 Thread Kuai Yu (JIRA)
Kuai Yu created GOBBLIN-510:
---

 Summary: Decouple JobExecutionLauncher and JobExecutionDriver
 Key: GOBBLIN-510
 URL: https://issues.apache.org/jira/browse/GOBBLIN-510
 Project: Apache Gobblin
  Issue Type: New Feature
Reporter: Kuai Yu
Assignee: Kuai Yu


Today JobExecutionLauncher and JobExecutionDriver is coupled. It means when 
JobExecutionLauncher invokes launchJob, a JobExecutionDriver is immediately 
return. This is not good for gobblin cluster because the Launcher might running 
in manager node but the actual driver logic is running on worker node. We need 
some refactoring to allow us decouple these two.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GOBBLIN-506) Job tagging support in Gobblin cluster

2018-05-29 Thread Kuai Yu (JIRA)
Kuai Yu created GOBBLIN-506:
---

 Summary: Job tagging support in Gobblin cluster
 Key: GOBBLIN-506
 URL: https://issues.apache.org/jira/browse/GOBBLIN-506
 Project: Apache Gobblin
  Issue Type: New Feature
Reporter: Kuai Yu
Assignee: Kuai Yu






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GOBBLIN-504) HiveMetastoreClientPool has findbugsMain issue due to unprotected static variable initialization

2018-05-24 Thread Kuai Yu (JIRA)
Kuai Yu created GOBBLIN-504:
---

 Summary: HiveMetastoreClientPool has findbugsMain issue due to 
unprotected static variable initialization
 Key: GOBBLIN-504
 URL: https://issues.apache.org/jira/browse/GOBBLIN-504
 Project: Apache Gobblin
  Issue Type: Bug
Reporter: Kuai Yu
Assignee: Kuai Yu






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GOBBLIN-497) GobblinHelixJobScheduler should not start scheduling before the scheduler service is up

2018-05-21 Thread Kuai Yu (JIRA)
Kuai Yu created GOBBLIN-497:
---

 Summary: GobblinHelixJobScheduler should not start scheduling 
before the scheduler service is up
 Key: GOBBLIN-497
 URL: https://issues.apache.org/jira/browse/GOBBLIN-497
 Project: Apache Gobblin
  Issue Type: Bug
Reporter: Kuai Yu
Assignee: Kuai Yu






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GOBBLIN-495) FlowSpec should be deleted if this is run once flow

2018-05-17 Thread Kuai Yu (JIRA)
Kuai Yu created GOBBLIN-495:
---

 Summary: FlowSpec should be deleted if this is run once flow
 Key: GOBBLIN-495
 URL: https://issues.apache.org/jira/browse/GOBBLIN-495
 Project: Apache Gobblin
  Issue Type: New Feature
Reporter: Kuai Yu
Assignee: Kuai Yu






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (GOBBLIN-495) FlowSpec should be deleted if this is run once flow

2018-05-17 Thread Kuai Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/GOBBLIN-495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuai Yu updated GOBBLIN-495:

Issue Type: Bug  (was: New Feature)

> FlowSpec should be deleted if this is run once flow
> ---
>
> Key: GOBBLIN-495
> URL: https://issues.apache.org/jira/browse/GOBBLIN-495
> Project: Apache Gobblin
>  Issue Type: Bug
>Reporter: Kuai Yu
>Assignee: Kuai Yu
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (GOBBLIN-490) Allow job to be distributed by Helix

2018-05-10 Thread Kuai Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/GOBBLIN-490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuai Yu updated GOBBLIN-490:

Description: This new job launcher will submit original job to Helix 
instead of the generated work units. This will allow Helix to re-distribute the 
original jobs to different worker nodes. Each worker then process the original 
jobs and launch its own GobblinHelixJobLauncher. By doing this, we will relieve 
manager node because the task driver logic (mainly generate work units) is now 
distributed to worker nodes.

> Allow job to be distributed by Helix
> 
>
> Key: GOBBLIN-490
> URL: https://issues.apache.org/jira/browse/GOBBLIN-490
> Project: Apache Gobblin
>  Issue Type: New Feature
>Reporter: Kuai Yu
>Assignee: Kuai Yu
>Priority: Major
>
> This new job launcher will submit original job to Helix instead of the 
> generated work units. This will allow Helix to re-distribute the original 
> jobs to different worker nodes. Each worker then process the original jobs 
> and launch its own GobblinHelixJobLauncher. By doing this, we will relieve 
> manager node because the task driver logic (mainly generate work units) is 
> now distributed to worker nodes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GOBBLIN-490) Allow job to be distributed by Helix

2018-05-10 Thread Kuai Yu (JIRA)
Kuai Yu created GOBBLIN-490:
---

 Summary: Allow job to be distributed by Helix
 Key: GOBBLIN-490
 URL: https://issues.apache.org/jira/browse/GOBBLIN-490
 Project: Apache Gobblin
  Issue Type: New Feature
Reporter: Kuai Yu
Assignee: Kuai Yu






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GOBBLIN-484) Propagate fork exception to task commit

2018-05-02 Thread Kuai Yu (JIRA)
Kuai Yu created GOBBLIN-484:
---

 Summary: Propagate fork exception to task commit
 Key: GOBBLIN-484
 URL: https://issues.apache.org/jira/browse/GOBBLIN-484
 Project: Apache Gobblin
  Issue Type: Improvement
Reporter: Kuai Yu
Assignee: Kuai Yu


>>> Today if exception occurred in task level, we will not propagate this 
>>> exception to the commit phase, which means in fork.commit, we will see some 
>>> exceptions like this :

2018/04/30 08:03:19.369 ERROR [Task] [Task-committing-pool-0] 
[gobblin-cluster-worker] [DYNAMICS-CONTACT-438563007_1525075320170] Task 
task_DYNAMICS-CONTACT-438563007_1525075320170_0 failed
org.apache.gobblin.runtime.ForkException: Fork branches [0] failed for task 
task_DYNAMICS-CONTACT-438563007_1525075320170_0
at org.apache.gobblin.runtime.Task.commit(Task.java:884)
at 
org.apache.gobblin.runtime.GobblinMultiTaskAttempt$1$1.call(GobblinMultiTaskAttempt.java:167)
at 
org.apache.gobblin.runtime.GobblinMultiTaskAttempt$1$1.call(GobblinMultiTaskAttempt.java:162)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
org.apache.gobblin.util.executors.MDCPropagatingRunnable.run(MDCPropagatingRunnable.java:39)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

>>> However the root cause of exception happened earlier before the commit 
>>> phase, which is in the task run() stage, some records failed to process:

2018/04/30 08:03:19.352 ERROR [Task] [TaskExecutor-1] [gobblin-cluster-worker] 
[DYNAMICS-CONTACT-438563007_1525075320170] Processing record incurs an 
unexpected exception:
java.lang.IllegalStateException: Fork 0 of task 
task_DYNAMICS-CONTACT-438563007_1525075320170_0 has failed and is no longer 
running
at org.apache.gobblin.runtime.fork.Fork.putRecord(Fork.java:285)
at org.apache.gobblin.runtime.Task.processRecord(Task.java:778)
at org.apache.gobblin.runtime.Task.runSynchronousModel(Task.java:459)
at org.apache.gobblin.runtime.Task.run(Task.java:341)
at 
org.apache.gobblin.runtime.TaskExecutor$TrackingTask.run(TaskExecutor.java:443)
at 
org.apache.gobblin.util.executors.MDCPropagatingRunnable.run(MDCPropagatingRunnable.java:39)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
2018/04/30 08:03:19.353 ERROR [Task] [TaskExecutor-1] [gobblin-cluster-worker] 
[DYNAMICS-CONTACT-438563007_1525075320170] Task 
task_DYNAMICS-CONTACT-438563007_1525075320170_0 failed
java.lang.RuntimeException
at org.apache.gobblin.runtime.Task.runSynchronousModel(Task.java:464)
at org.apache.gobblin.runtime.Task.run(Task.java:341)
at 
org.apache.gobblin.runtime.TaskExecutor$TrackingTask.run(TaskExecutor.java:443)
at 
org.apache.gobblin.util.executors.MDCPropagatingRunnable.run(MDCPropagatingRunnable.java:39)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
2018/04/30 08:03:19.368 INFO [com_2792] [TaskState

>>> Now further look into the problem, we know it is due to the record 
>>> processing timeout from espresso writer:

2018/04/30 08:03:19.348 ERROR [Fork-0] [ForkExecutor-0] 
[gobblin-cluster-worker] [DYNAMICS-CONTACT-438563007_1525075320170] Fork 0 of 
task task_DYNAMICS-CONTACT-438563007_1525075320170_0 failed to process data 
records
java.io.IOException: java.util.concurrent.ExecutionException: 
org.apache.gobblin.exception.NonTransientException: Irrecoverable failure on 

[jira] [Updated] (GOBBLIN-480) Allow job distribution cluster to be separated from cluster manager cluster

2018-04-29 Thread Kuai Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/GOBBLIN-480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuai Yu updated GOBBLIN-480:

Description: 
Today GobblinClusterManager leverages single Helix cluster responsible for both 
job distribution and cluster manager HA. This all-in-one mode cannot works with 
Helix super controller, because GobblinClusterManager will create its own 
dedicated controller for HA handling, which is internal to Gobblin framework. 
This architect works fine but gradually we find it's hard to monitor Helix 
behavior and debug Helix related issues due to the lack of Helix task framework 
metrics, which is enabled for free, but only available when using a dedicated 
controllers under Helix super controller's supervision.

To allow the migration, we separated existing cluster into two clusters:

1. Our existing cluster will remain the same, but called as "job distribution 
cluster" in the separation mode. In unit test or local deployment mode, we will 
create a dedicated controller for this cluster. In production mode, we can 
assume Helix will provide a dedicated controller for us.

2. A new cluster will be created, now called 'manager cluster', which is 
responsible for cluster manager leadership change. This will provide leadership 
change callback just like we did earlier in all-in-one mode.

The new 'two cluster mode' can be turned on/off by user configuration. 
Similarly user can configure whether a controller for job distribution should 
be created.

  was:
Today GobblinClusterManager leverages single Helix cluster responsible for both 
job distribution and cluster manager HA. This all-in-one mode cannot works with 
Helix super controller, because GobblinClusterManager will create its own 
dedicated controller for HA handling, which is internal to Gobblin framework. 
This architect works fine but gradually we find it's hard to monitor Helix 
behavior and debug Helix related issues due to the lack of Helix task framework 
metrics, which is enabled for free, but only available when using a dedicated 
controllers under Helix super controller's supervision.

To allow the migration, we separated existing cluster into two clusters:

1. Our existing cluster will remain the same, called "job distribution 
cluster". In unit test or local deployment mode, we will create a dedicated 
controller for this cluster. In production mode, we assume Helix will provide 
this dedicated controller for us.

2. A new cluster will be created, called 'manager cluster', which is 
responsible for cluster manager leadership change. This will leadership change 
callback just like we did earlier in all-in-one mode.

Two cluster mode can be turned on/off by user configuration. Similarly to 
whether a controller for job distribution should be created.


> Allow job distribution cluster to be separated from cluster manager cluster
> ---
>
> Key: GOBBLIN-480
> URL: https://issues.apache.org/jira/browse/GOBBLIN-480
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Kuai Yu
>Assignee: Kuai Yu
>Priority: Major
>
> Today GobblinClusterManager leverages single Helix cluster responsible for 
> both job distribution and cluster manager HA. This all-in-one mode cannot 
> works with Helix super controller, because GobblinClusterManager will create 
> its own dedicated controller for HA handling, which is internal to Gobblin 
> framework. This architect works fine but gradually we find it's hard to 
> monitor Helix behavior and debug Helix related issues due to the lack of 
> Helix task framework metrics, which is enabled for free, but only available 
> when using a dedicated controllers under Helix super controller's supervision.
> To allow the migration, we separated existing cluster into two clusters:
> 1. Our existing cluster will remain the same, but called as "job distribution 
> cluster" in the separation mode. In unit test or local deployment mode, we 
> will create a dedicated controller for this cluster. In production mode, we 
> can assume Helix will provide a dedicated controller for us.
> 2. A new cluster will be created, now called 'manager cluster', which is 
> responsible for cluster manager leadership change. This will provide 
> leadership change callback just like we did earlier in all-in-one mode.
> The new 'two cluster mode' can be turned on/off by user configuration. 
> Similarly user can configure whether a controller for job distribution should 
> be created.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GOBBLIN-480) Allow job distribution cluster to be separated from cluster manager cluster

2018-04-29 Thread Kuai Yu (JIRA)
Kuai Yu created GOBBLIN-480:
---

 Summary: Allow job distribution cluster to be separated from 
cluster manager cluster
 Key: GOBBLIN-480
 URL: https://issues.apache.org/jira/browse/GOBBLIN-480
 Project: Apache Gobblin
  Issue Type: Improvement
Reporter: Kuai Yu
Assignee: Kuai Yu


Today GobblinClusterManager leverages single Helix cluster responsible for both 
job distribution and cluster manager HA. This all-in-one mode cannot works with 
Helix super controller, because GobblinClusterManager will create its own 
dedicated controller for HA handling, which is internal to Gobblin framework. 
This architect works fine but gradually we find it's hard to monitor Helix 
behavior and debug Helix related issues due to the lack of Helix task framework 
metrics, which is enabled for free, but only available when using a dedicated 
controllers under Helix super controller's supervision.

To allow the migration, we separated existing cluster into two clusters:

1. Our existing cluster will remain the same, called "job distribution 
cluster". In unit test or local deployment mode, we will create a dedicated 
controller for this cluster. In production mode, we assume Helix will provide 
this dedicated controller for us.

2. A new cluster will be created, called 'manager cluster', which is 
responsible for cluster manager leadership change. This will leadership change 
callback just like we did earlier in all-in-one mode.

Two cluster mode can be turned on/off by user configuration. Similarly to 
whether a controller for job distribution should be created.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GOBBLIN-476) Add helix task timeout

2018-04-25 Thread Kuai Yu (JIRA)
Kuai Yu created GOBBLIN-476:
---

 Summary: Add helix task timeout
 Key: GOBBLIN-476
 URL: https://issues.apache.org/jira/browse/GOBBLIN-476
 Project: Apache Gobblin
  Issue Type: Improvement
Reporter: Kuai Yu






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GOBBLIN-473) Allow user to configure different lookback time for different datasets

2018-04-24 Thread Kuai Yu (JIRA)
Kuai Yu created GOBBLIN-473:
---

 Summary: Allow user to configure different lookback time for 
different datasets
 Key: GOBBLIN-473
 URL: https://issues.apache.org/jira/browse/GOBBLIN-473
 Project: Apache Gobblin
  Issue Type: Improvement
Reporter: Kuai Yu
Assignee: Kuai Yu






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GOBBLIN-466) Reuse same connector for Salesforce dynamic partitioning

2018-04-13 Thread Kuai Yu (JIRA)
Kuai Yu created GOBBLIN-466:
---

 Summary: Reuse same connector for Salesforce dynamic partitioning
 Key: GOBBLIN-466
 URL: https://issues.apache.org/jira/browse/GOBBLIN-466
 Project: Apache Gobblin
  Issue Type: Improvement
Reporter: Kuai Yu
Assignee: Kuai Yu


We add getConnector method in salesforce source class to allow:

1) Any derived class overwrite this method. 

2) Always use same connector to get watermark metadata



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GOBBLIN-448) Add glob pattern blacklist in ConfigurableGlobDatasetFinder

2018-03-27 Thread Kuai Yu (JIRA)
Kuai Yu created GOBBLIN-448:
---

 Summary: Add glob pattern blacklist in 
ConfigurableGlobDatasetFinder
 Key: GOBBLIN-448
 URL: https://issues.apache.org/jira/browse/GOBBLIN-448
 Project: Apache Gobblin
  Issue Type: Improvement
Reporter: Kuai Yu
Assignee: Kuai Yu






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GOBBLIN-445) Add task output directory for staging compaction result

2018-03-26 Thread Kuai Yu (JIRA)
Kuai Yu created GOBBLIN-445:
---

 Summary: Add task output directory for staging compaction result
 Key: GOBBLIN-445
 URL: https://issues.apache.org/jira/browse/GOBBLIN-445
 Project: Apache Gobblin
  Issue Type: Improvement
Reporter: Kuai Yu
Assignee: Kuai Yu






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GOBBLIN-436) Salesforce doesn't have default constructor

2018-03-22 Thread Kuai Yu (JIRA)
Kuai Yu created GOBBLIN-436:
---

 Summary: Salesforce doesn't have default constructor
 Key: GOBBLIN-436
 URL: https://issues.apache.org/jira/browse/GOBBLIN-436
 Project: Apache Gobblin
  Issue Type: Improvement
Reporter: Kuai Yu
Assignee: Kuai Yu






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GOBBLIN-423) Limit records or bucket counts for dynamic probing

2018-03-08 Thread Kuai Yu (JIRA)
Kuai Yu created GOBBLIN-423:
---

 Summary: Limit records or bucket counts for dynamic probing
 Key: GOBBLIN-423
 URL: https://issues.apache.org/jira/browse/GOBBLIN-423
 Project: Apache Gobblin
  Issue Type: Improvement
Reporter: Kuai Yu
Assignee: Kuai Yu






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GOBBLIN-419) Add more metrics for cluster job scheduling

2018-02-27 Thread Kuai Yu (JIRA)
Kuai Yu created GOBBLIN-419:
---

 Summary: Add more metrics for cluster job scheduling
 Key: GOBBLIN-419
 URL: https://issues.apache.org/jira/browse/GOBBLIN-419
 Project: Apache Gobblin
  Issue Type: Improvement
Reporter: Kuai Yu
Assignee: Kuai Yu






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GOBBLIN-416) Allow user to configure java options to launch child process for cluster task isolation

2018-02-22 Thread Kuai Yu (JIRA)
Kuai Yu created GOBBLIN-416:
---

 Summary: Allow user to configure java options to launch child 
process for cluster task isolation
 Key: GOBBLIN-416
 URL: https://issues.apache.org/jira/browse/GOBBLIN-416
 Project: Apache Gobblin
  Issue Type: Improvement
Reporter: Kuai Yu






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GOBBLIN-403) Fix the NPE issue due to uninitialized kafkajobmonitor metrics

2018-02-02 Thread Kuai Yu (JIRA)
Kuai Yu created GOBBLIN-403:
---

 Summary: Fix the NPE issue due to uninitialized kafkajobmonitor 
metrics
 Key: GOBBLIN-403
 URL: https://issues.apache.org/jira/browse/GOBBLIN-403
 Project: Apache Gobblin
  Issue Type: Bug
Reporter: Kuai Yu






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GOBBLIN-378) Task only publish data when the state is successful in the earlier processing

2018-01-17 Thread Kuai Yu (JIRA)
Kuai Yu created GOBBLIN-378:
---

 Summary: Task only publish data when the state is successful in 
the earlier processing
 Key: GOBBLIN-378
 URL: https://issues.apache.org/jira/browse/GOBBLIN-378
 Project: Apache Gobblin
  Issue Type: Bug
Reporter: Kuai Yu
Assignee: Kuai Yu






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (GOBBLIN-373) Expose task executor auto scale metrics to external sensor

2018-01-16 Thread Kuai Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/GOBBLIN-373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16327617#comment-16327617
 ] 

Kuai Yu commented on GOBBLIN-373:
-

[~jbaranick], we are using *StandardMetricsBridge* interface to expose these 
metrics (this PR is just for exposing these metrics only). Inside LinkedIn, we 
have another internal project to convert these metrics to another type of 
object called Sensor, which is used by LinkedIn to show metrics on many 
dashboards.

> Expose task executor auto scale metrics to external sensor
> --
>
> Key: GOBBLIN-373
> URL: https://issues.apache.org/jira/browse/GOBBLIN-373
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Kuai Yu
>Assignee: Kuai Yu
>Priority: Major
>
> This is used for LinkedIn inGraph integration.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GOBBLIN-373) Expose task executor auto scale metrics to external sensor

2018-01-16 Thread Kuai Yu (JIRA)
Kuai Yu created GOBBLIN-373:
---

 Summary: Expose task executor auto scale metrics to external sensor
 Key: GOBBLIN-373
 URL: https://issues.apache.org/jira/browse/GOBBLIN-373
 Project: Apache Gobblin
  Issue Type: Task
Reporter: Kuai Yu
Assignee: Kuai Yu


This is used for LinkedIn inGraph integration.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GOBBLIN-358) Add logs for GobblinMetrics

2018-01-05 Thread Kuai Yu (JIRA)
Kuai Yu created GOBBLIN-358:
---

 Summary: Add logs for GobblinMetrics
 Key: GOBBLIN-358
 URL: https://issues.apache.org/jira/browse/GOBBLIN-358
 Project: Apache Gobblin
  Issue Type: Bug
Reporter: Kuai Yu






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (GOBBLIN-356) hanging when retrieving kafka schema

2018-01-03 Thread Kuai Yu (JIRA)
Kuai Yu created GOBBLIN-356:
---

 Summary: hanging when retrieving kafka schema
 Key: GOBBLIN-356
 URL: https://issues.apache.org/jira/browse/GOBBLIN-356
 Project: Apache Gobblin
  Issue Type: Bug
Reporter: Kuai Yu






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (GOBBLIN-349) Add guages for gobblin cluster metrics

2017-12-15 Thread Kuai Yu (JIRA)
Kuai Yu created GOBBLIN-349:
---

 Summary: Add guages for gobblin cluster metrics
 Key: GOBBLIN-349
 URL: https://issues.apache.org/jira/browse/GOBBLIN-349
 Project: Apache Gobblin
  Issue Type: Improvement
Reporter: Kuai Yu


Although we already have a counter metrics, but we still add a gauge metrics 
for completeness because internally LinkedIn will use healthcheck sensor to 
process the metrics, the counter will be treated as a rate instead of a real 
time number.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (GOBBLIN-326) Gobblin metrics constructor only provides default constructor for Codhale metrics

2017-11-29 Thread Kuai Yu (JIRA)
Kuai Yu created GOBBLIN-326:
---

 Summary: Gobblin metrics constructor only provides default 
constructor for Codhale metrics
 Key: GOBBLIN-326
 URL: https://issues.apache.org/jira/browse/GOBBLIN-326
 Project: Apache Gobblin
  Issue Type: Improvement
Reporter: Kuai Yu
Assignee: Kuai Yu






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (GOBBLIN-320) Add metrics to GobblinHelixJobScheduler

2017-11-22 Thread Kuai Yu (JIRA)
Kuai Yu created GOBBLIN-320:
---

 Summary: Add metrics to GobblinHelixJobScheduler
 Key: GOBBLIN-320
 URL: https://issues.apache.org/jira/browse/GOBBLIN-320
 Project: Apache Gobblin
  Issue Type: Improvement
Reporter: Kuai Yu
Assignee: Kuai Yu






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (GOBBLIN-308) Gobblin cluster bootup hangs

2017-11-07 Thread Kuai Yu (JIRA)
Kuai Yu created GOBBLIN-308:
---

 Summary: Gobblin cluster bootup hangs
 Key: GOBBLIN-308
 URL: https://issues.apache.org/jira/browse/GOBBLIN-308
 Project: Apache Gobblin
  Issue Type: Bug
Reporter: Kuai Yu
Assignee: Kuai Yu


The problem happens when there are more than 100 files in the job catalog. 
During the boot up sequence, spec consumer was launched after jobCatalog. 
However the jobCatalog launches with a job listener which will push job spec 
into a blocking queue, and due to spec consumer hasn't been started, no 
component will start to consume job specs from the blocking queue. Once the 
blocking queue max size (100 by default) is reached, the system is hanging.





--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (GOBBLIN-303) Compaction can generate zero sized output when MR is in speculative mode

2017-11-02 Thread Kuai Yu (JIRA)
Kuai Yu created GOBBLIN-303:
---

 Summary: Compaction can generate zero sized output when MR is in 
speculative mode
 Key: GOBBLIN-303
 URL: https://issues.apache.org/jira/browse/GOBBLIN-303
 Project: Apache Gobblin
  Issue Type: Bug
Reporter: Kuai Yu
Assignee: Kuai Yu
Priority: Minor


Currently if MR job used speculative mode, it was very likely that output has a 
zero sized file generated by a killed task attempt. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (GOBBLIN-277) Add a lock to make multihop thread safe

2017-10-05 Thread Kuai Yu (JIRA)
Kuai Yu created GOBBLIN-277:
---

 Summary: Add a lock to make multihop thread safe
 Key: GOBBLIN-277
 URL: https://issues.apache.org/jira/browse/GOBBLIN-277
 Project: Apache Gobblin
  Issue Type: Bug
Reporter: Kuai Yu
Assignee: Kuai Yu






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (GOBBLIN-268) Unique job uri and job name generation for GaaS

2017-09-27 Thread Kuai Yu (JIRA)
Kuai Yu created GOBBLIN-268:
---

 Summary: Unique job uri and job name generation for GaaS
 Key: GOBBLIN-268
 URL: https://issues.apache.org/jira/browse/GOBBLIN-268
 Project: Apache Gobblin
  Issue Type: Bug
Reporter: Kuai Yu
Assignee: Kuai Yu






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (GOBBLIN-252) Add some azkaban related constants

2017-09-14 Thread Kuai Yu (JIRA)
Kuai Yu created GOBBLIN-252:
---

 Summary: Add some azkaban related constants
 Key: GOBBLIN-252
 URL: https://issues.apache.org/jira/browse/GOBBLIN-252
 Project: Apache Gobblin
  Issue Type: Improvement
Reporter: Kuai Yu
Assignee: Kuai Yu






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (GOBBLIN-241) Allow multiple datasets send different lineage event for kafka

2017-09-11 Thread Kuai Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/GOBBLIN-241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuai Yu updated GOBBLIN-241:

Summary: Allow multiple datasets send different lineage event for kafka  
(was: Add task level lineage submission for kafka lineage event support)

> Allow multiple datasets send different lineage event for kafka
> --
>
> Key: GOBBLIN-241
> URL: https://issues.apache.org/jira/browse/GOBBLIN-241
> Project: Apache Gobblin
>  Issue Type: Bug
>Reporter: Kuai Yu
>Assignee: Kuai Yu
>
> This task is mainly to add or refactor existing lineage events support. Allow 
> task level publisher to submit lineage event.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (GOBBLIN-244) Need additional info for gobblin tracking hourly-deduped

2017-09-07 Thread Kuai Yu (JIRA)
Kuai Yu created GOBBLIN-244:
---

 Summary: Need additional info for gobblin tracking hourly-deduped
 Key: GOBBLIN-244
 URL: https://issues.apache.org/jira/browse/GOBBLIN-244
 Project: Apache Gobblin
  Issue Type: Bug
Reporter: Kuai Yu
Assignee: Kuai Yu


Add the previous record count and the number of execution runs.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (GOBBLIN-241) Add task level lineage submission for kafka lineage event support

2017-09-06 Thread Kuai Yu (JIRA)
Kuai Yu created GOBBLIN-241:
---

 Summary: Add task level lineage submission for kafka lineage event 
support
 Key: GOBBLIN-241
 URL: https://issues.apache.org/jira/browse/GOBBLIN-241
 Project: Apache Gobblin
  Issue Type: Bug
Reporter: Kuai Yu
Assignee: Kuai Yu


This task is mainly to add or refactor existing lineage events support. Allow 
task level publisher to submit lineage event.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (GOBBLIN-235) Prevent log warnings when TaskStateCollectorService has no task states detected

2017-09-01 Thread Kuai Yu (JIRA)
Kuai Yu created GOBBLIN-235:
---

 Summary: Prevent log warnings when TaskStateCollectorService has 
no task states detected
 Key: GOBBLIN-235
 URL: https://issues.apache.org/jira/browse/GOBBLIN-235
 Project: Apache Gobblin
  Issue Type: Bug
Reporter: Kuai Yu
Assignee: Kuai Yu


Need to adjust log level from warning to debug



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (GOBBLIN-233) Add concurrent map to avoid multiple job submission from GobblinHelixJobScheduler

2017-08-31 Thread Kuai Yu (JIRA)
Kuai Yu created GOBBLIN-233:
---

 Summary: Add concurrent map to avoid multiple job submission from 
GobblinHelixJobScheduler 
 Key: GOBBLIN-233
 URL: https://issues.apache.org/jira/browse/GOBBLIN-233
 Project: Apache Gobblin
  Issue Type: Bug
Reporter: Kuai Yu
Assignee: Kuai Yu


Current helix job scheduler doesn't examine if any existing job of same type is 
running in the queue. Need some lock similar protection to avoid multiple job 
submission to reduce the workload of gobblin and helix.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (GOBBLIN-214) Filtering doesn't work in FileListUtils:listFilesRecursively

2017-08-18 Thread Kuai Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/GOBBLIN-214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16133645#comment-16133645
 ] 

Kuai Yu commented on GOBBLIN-214:
-

The problem came up when we try to use this method to filter out all AVRO files 
when a base directory was passed in as an argument. With previous logic, the 
filter only applies to the directories, instead of files. So if we have 
baseDir/_schema.avsc file present, it won't be able to skip.

> Filtering doesn't work in FileListUtils:listFilesRecursively
> 
>
> Key: GOBBLIN-214
> URL: https://issues.apache.org/jira/browse/GOBBLIN-214
> Project: Apache Gobblin
>  Issue Type: Bug
>Reporter: Kuai Yu
>Assignee: Kuai Yu
>
> The filtering logic for FileListUtils:listFilesRecursively was wrong. It 
> never applies the filtering to the files that is non-directory type



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (GOBBLIN-214) Filtering doesn't work in FileListUtils:listFilesRecursively

2017-08-18 Thread Kuai Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/GOBBLIN-214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuai Yu updated GOBBLIN-214:

Description: The filtering logic for FileListUtils:listFilesRecursively was 
wrong. It never applies the filtering to the files that is non-directory type  
(was: The filtering logic for FileListUtils:listFilesRecursively was wrong.)

> Filtering doesn't work in FileListUtils:listFilesRecursively
> 
>
> Key: GOBBLIN-214
> URL: https://issues.apache.org/jira/browse/GOBBLIN-214
> Project: Apache Gobblin
>  Issue Type: Bug
>Reporter: Kuai Yu
>Assignee: Kuai Yu
>
> The filtering logic for FileListUtils:listFilesRecursively was wrong. It 
> never applies the filtering to the files that is non-directory type



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (GOBBLIN-214) Filtering doesn't work in FileListUtils:listFilesRecursively

2017-08-18 Thread Kuai Yu (JIRA)
Kuai Yu created GOBBLIN-214:
---

 Summary: Filtering doesn't work in 
FileListUtils:listFilesRecursively
 Key: GOBBLIN-214
 URL: https://issues.apache.org/jira/browse/GOBBLIN-214
 Project: Apache Gobblin
  Issue Type: Bug
Reporter: Kuai Yu
Assignee: Kuai Yu


The filtering logic for FileListUtils:listFilesRecursively was wrong.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Closed] (GOBBLIN-38) Create workunitstream for CompactionSource

2017-08-17 Thread Kuai Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/GOBBLIN-38?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuai Yu closed GOBBLIN-38.
--
Resolution: Fixed

This is a duplicate PR. We already have a workunit stream for CompactionSource. 
Close this one

> Create workunitstream for CompactionSource
> --
>
> Key: GOBBLIN-38
> URL: https://issues.apache.org/jira/browse/GOBBLIN-38
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Kuai Yu
>
> *Github Url* : https://github.com/linkedin/gobblin/pull/1826 
> *Github Reporter* : [~yukuai518] 
> *Github Created At* : 2017-05-02T22:54:52Z 
> *Github Updated At* : 2017-06-13T15:45:12Z 
> h3. Comments 
> 
> [~ibuenros] wrote on 2017-06-13T15:45:12Z : @yukuai518 what is the status of 
> this PR? 
>  
> *Github Url* : 
> https://github.com/linkedin/gobblin/pull/1826#issuecomment-308160180



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (GOBBLIN-19) dataset specific properties are ignored by KafkaBiLevelWorkUnitPacker

2017-08-07 Thread Kuai Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/GOBBLIN-19?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuai Yu reassigned GOBBLIN-19:
--

Assignee: Kuai Yu
  Sprint: Apache Gobblin 170807

> dataset specific properties are ignored by KafkaBiLevelWorkUnitPacker
> -
>
> Key: GOBBLIN-19
> URL: https://issues.apache.org/jira/browse/GOBBLIN-19
> Project: Apache Gobblin
>  Issue Type: Bug
>Reporter: Clemens Valiente
>Assignee: Kuai Yu
>
> I failed to get dataset.specific.props to work on our jobs, and I think I 
> found the reason:
> in KafkaSource.getWorkUnitForTopicPartition the properties are added 
> correctly to the individual workunits.
> The KafkaBiLevelWorkUnitPacker then assigns the WorkUnits to their bins and 
> combines them into one WorkUnit in squeezeMultiWorkUnit() but doesn't copy 
> over the topicSpecificSettings.
> Using the KafkaSingleLevelWorkUnitPacker works fine with 
> dataset.specific.props since it doesn't call squeezeMultiWorkUnit on 
> non-empty workUnits.
>  
> *Github Url* : https://github.com/linkedin/gobblin/issues/1901 
> *Github Reporter* : [~cvaliente] 
> *Github Created At* : 2017-05-26T09:25:54Z 
> *Github Updated At* : 2017-05-31T06:39:04Z 
> h3. Comments 
> 
> [~cvaliente] wrote on 2017-05-26T10:55:37Z : fix in #1903  
>  
> *Github Url* : 
> https://github.com/linkedin/gobblin/issues/1901#issuecomment-304253329 
> 
> [~stakiar] wrote on 2017-05-30T17:42:07Z : Doesn't 
> `KafkaSource#addTopicSpecificPropsToWorkUnits` handle adding dataset specific 
> configuration? That method is run after the bin-packing is done. So if 
> `dataset.specific.props` isn't working I would guess the bug would be in that 
> method. 
>  
> *Github Url* : 
> https://github.com/linkedin/gobblin/issues/1901#issuecomment-304953996 
> 
> [~cvaliente] wrote on 2017-05-31T06:39:04Z : You are right, that wasn't yet 
> implemented in 0.9 and I forgot to check upstream. 
>  
> *Github Url* : 
> https://github.com/linkedin/gobblin/issues/1901#issuecomment-305098396



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)