[jira] [Resolved] (GOBBLIN-576) Send partition level lineage in hive distcp

2018-09-11 Thread Hung Tran (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-576.
---
   Resolution: Fixed
Fix Version/s: 0.14.0

Issue resolved by pull request #2442
[https://github.com/apache/incubator-gobblin/pull/2442]

> Send partition level lineage in hive distcp
> ---
>
> Key: GOBBLIN-576
> URL: https://issues.apache.org/jira/browse/GOBBLIN-576
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Zhixiong Chen
>Assignee: Zhixiong Chen
>Priority: Major
> Fix For: 0.14.0
>
>
> Currently hive distcp only supports dataset/table level lineage. The task is 
> to send lineage at the table partition level if any.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (GOBBLIN-567) Create config store that downloads and reads from a local jar

2018-09-11 Thread Hung Tran (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-567.
---
   Resolution: Fixed
Fix Version/s: 0.14.0

Issue resolved by pull request #2430
[https://github.com/apache/incubator-gobblin/pull/2430]

> Create config store that downloads and reads from a local jar
> -
>
> Key: GOBBLIN-567
> URL: https://issues.apache.org/jira/browse/GOBBLIN-567
> Project: Apache Gobblin
>  Issue Type: Bug
>Reporter: Jack Moseley
>Assignee: Jack Moseley
>Priority: Major
> Fix For: 0.14.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (GOBBLIN-589) Add more Gobblin tracking metrics in KafkaExtractorTopicMetadata event

2018-09-17 Thread Hung Tran (JIRA)


[ 
https://issues.apache.org/jira/browse/GOBBLIN-589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16618280#comment-16618280
 ] 

Hung Tran commented on GOBBLIN-589:
---

Additional changes in pull request #2457

https://github.com/apache/incubator-gobblin/pull/2457

> Add more Gobblin tracking metrics in KafkaExtractorTopicMetadata event
> --
>
> Key: GOBBLIN-589
> URL: https://issues.apache.org/jira/browse/GOBBLIN-589
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Carl Shen
>Priority: Major
> Fix For: 0.14.0
>
>
> Add start/stop fetch epoch times so that true extractor ingestion values can 
> be used instead of estimations.
> Add previous\{start/stop fetch epoch times, low/high watermarks}: saves extra 
> queries required to do lag operations for monitoring purposes.
> Add partitionTotalSize and undecodableMessageCount



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (GOBBLIN-589) Add more Gobblin tracking metrics in KafkaExtractorTopicMetadata event

2018-09-17 Thread Hung Tran (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-589.
---
   Resolution: Fixed
Fix Version/s: 0.14.0

Issue resolved by pull request #2455
[https://github.com/apache/incubator-gobblin/pull/2455]

> Add more Gobblin tracking metrics in KafkaExtractorTopicMetadata event
> --
>
> Key: GOBBLIN-589
> URL: https://issues.apache.org/jira/browse/GOBBLIN-589
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Carl Shen
>Priority: Major
> Fix For: 0.14.0
>
>
> Add start/stop fetch epoch times and partition total size: so that true 
> extractor ingestion values can be used instead of estimations.
> Add previous\{start/stop fetch epoch times, low/high watermarks}: saves extra 
> queries required to do lag operations for monitoring purposes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (GOBBLIN-586) Feature to apply retention in remote HDFS

2018-09-17 Thread Hung Tran (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-586.
---
   Resolution: Fixed
Fix Version/s: 0.14.0

Issue resolved by pull request #2452
[https://github.com/apache/incubator-gobblin/pull/2452]

> Feature to apply retention in remote HDFS
> -
>
> Key: GOBBLIN-586
> URL: https://issues.apache.org/jira/browse/GOBBLIN-586
> Project: Apache Gobblin
>  Issue Type: New Feature
>Reporter: Karthik Amarnath
>Priority: Minor
> Fix For: 0.14.0
>
>
> Enhancement feature to apply retention in remote HDFS by reading the 
> configuration from local HDFS.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (GOBBLIN-591) Allow user to pass in the customized http client for azkaban client

2018-09-18 Thread Hung Tran (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-591.
---
   Resolution: Fixed
Fix Version/s: 0.14.0

Issue resolved by pull request #2458
[https://github.com/apache/incubator-gobblin/pull/2458]

> Allow user to pass in the customized http client for azkaban client
> ---
>
> Key: GOBBLIN-591
> URL: https://issues.apache.org/jira/browse/GOBBLIN-591
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Kuai Yu
>Assignee: Kuai Yu
>Priority: Major
> Fix For: 0.14.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GOBBLIN-671) Close the underlying writer when a HiveWritableHdfsDataWriter is closed

2019-01-24 Thread Hung Tran (JIRA)
Hung Tran created GOBBLIN-671:
-

 Summary: Close the underlying writer when a 
HiveWritableHdfsDataWriter is closed
 Key: GOBBLIN-671
 URL: https://issues.apache.org/jira/browse/GOBBLIN-671
 Project: Apache Gobblin
  Issue Type: Task
Reporter: Hung Tran
Assignee: Hung Tran


The HiveWritableHdfsDataWriter writer does not close the underlying writer when 
close() is called. This results in holding onto writer resources after the 
close. For some underlying writers like an OrcRecordWriter this case result a 
large amount of memory being buffered which leads to OOMs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (GOBBLIN-721) Gobblin streaming recipe is broken

2019-04-01 Thread Hung Tran (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-721.
---
   Resolution: Fixed
Fix Version/s: 0.15.0

Issue resolved by pull request #2588
[https://github.com/apache/incubator-gobblin/pull/2588]

> Gobblin streaming recipe is broken
> --
>
> Key: GOBBLIN-721
> URL: https://issues.apache.org/jira/browse/GOBBLIN-721
> Project: Apache Gobblin
>  Issue Type: Bug
>  Components: gobblin-core
>Reporter: Shirshanka Das
>Assignee: Shirshanka Das
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Running the simple streaming pull file results in a "negative acks" problem 
> because the internal pipeline has been re-architected to ack automatically. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (GOBBLIN-703) Allow planning job to be run in a non-blocking way

2019-03-25 Thread Hung Tran (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-703.
---
   Resolution: Fixed
Fix Version/s: 0.15.0

Issue resolved by pull request #2573
[https://github.com/apache/incubator-gobblin/pull/2573]

> Allow planning job to be run in a non-blocking way
> --
>
> Key: GOBBLIN-703
> URL: https://issues.apache.org/jira/browse/GOBBLIN-703
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Kuai Yu
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Today all the planning job will be running in a dedicated thread pool and 
> will wait until the full execution to be completed. This requires a lot of 
> system resources and a dedicated monitoring thread. The improvement here is 
> to reduce the waiting time on a dedicated monitoring thread. Basically once 
> the planning job submits to the Helix, we don't need to wait on the job 
> completion. The job status monitoring will be achieved by GaaS monitoring. 
> By doing this, we are freeing most of the threadpool resources because each 
> monitoring thread will be immediately return after it finishes the job 
> submission.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (GOBBLIN-706) Making KafkaSource dynamically determine the number of mapper

2019-03-26 Thread Hung Tran (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-706.
---
   Resolution: Fixed
Fix Version/s: 0.15.0

Issue resolved by pull request #2576
[https://github.com/apache/incubator-gobblin/pull/2576]

> Making KafkaSource dynamically determine the number of mapper
> -
>
> Key: GOBBLIN-706
> URL: https://issues.apache.org/jira/browse/GOBBLIN-706
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Lei Sun
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (GOBBLIN-717) Filter Out Empty MultiWorkUnits

2019-04-03 Thread Hung Tran (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-717.
---
   Resolution: Fixed
Fix Version/s: 0.15.0

Issue resolved by pull request #2584
[https://github.com/apache/incubator-gobblin/pull/2584]

> Filter Out Empty MultiWorkUnits
> ---
>
> Key: GOBBLIN-717
> URL: https://issues.apache.org/jira/browse/GOBBLIN-717
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Zihan Li
>Priority: Major
> Fix For: 0.15.0
>
>
> Now when we run a job, Gobblin use the value of max mappers or the target 
> size of a mapper to determine the number of mappers. But since one partition 
> cannot be divided into several WorkUnits, work cannot be evenly distributed, 
> there are many mappers(MultiWorkUnits) have no work to do. This will waste a 
> lot of resources. So we need to filter out MultiWorkUnits which contains no 
> WorkUnit when we determine the number of mappers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (GOBBLIN-712) Add version strategy for configbased dataset copy

2019-04-03 Thread Hung Tran (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-712.
---
   Resolution: Fixed
Fix Version/s: 0.15.0

Issue resolved by pull request #2579
[https://github.com/apache/incubator-gobblin/pull/2579]

> Add version strategy for configbased dataset copy
> -
>
> Key: GOBBLIN-712
> URL: https://issues.apache.org/jira/browse/GOBBLIN-712
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Kuai Yu
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (GOBBLIN-723) Add support to the LogCopier for copying from multiple source paths

2019-04-04 Thread Hung Tran (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-723.
---
   Resolution: Fixed
Fix Version/s: 0.15.0

Issue resolved by pull request #2590
[https://github.com/apache/incubator-gobblin/pull/2590]

> Add support to the LogCopier for copying from multiple source paths
> ---
>
> Key: GOBBLIN-723
> URL: https://issues.apache.org/jira/browse/GOBBLIN-723
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Hung Tran
>Assignee: Hung Tran
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The LogCopier should support multiple source paths.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (GOBBLIN-722) add option to unschedule a gaas flow

2019-04-04 Thread Hung Tran (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-722.
---
   Resolution: Fixed
Fix Version/s: 0.15.0

Issue resolved by pull request #2589
[https://github.com/apache/incubator-gobblin/pull/2589]

> add option to unschedule a gaas flow
> 
>
> Key: GOBBLIN-722
> URL: https://issues.apache.org/jira/browse/GOBBLIN-722
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Arjun Singh Bora
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GOBBLIN-723) Add support to the LogCopier for copying from multiple source paths

2019-04-04 Thread Hung Tran (JIRA)
Hung Tran created GOBBLIN-723:
-

 Summary: Add support to the LogCopier for copying from multiple 
source paths
 Key: GOBBLIN-723
 URL: https://issues.apache.org/jira/browse/GOBBLIN-723
 Project: Apache Gobblin
  Issue Type: Task
Reporter: Hung Tran
Assignee: Hung Tran


The LogCopier should support multiple source paths.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (GOBBLIN-716) Add lineage in FileBasedSource

2019-03-29 Thread Hung Tran (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-716.
---
   Resolution: Fixed
Fix Version/s: 0.15.0

Issue resolved by pull request #2583
[https://github.com/apache/incubator-gobblin/pull/2583]

> Add lineage in FileBasedSource
> --
>
> Key: GOBBLIN-716
> URL: https://issues.apache.org/jira/browse/GOBBLIN-716
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Zhixiong Chen
>Assignee: Zhixiong Chen
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Add lineage in `FileBasedSource`
> - By default, `FileBasedSource` marks dataset level source lineage
> - A `PartitionedFileSourceBase` marks partition level source lineage
> Fix destinations overwritten in `LineageInfo.putDestination(List 
> descriptors, int branchId, State state)`. Multiple calls should append given 
> descriptors



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (GOBBLIN-709) Provide an option to disallow concurrent flow executions in Gobblin-as-a-Service

2019-03-28 Thread Hung Tran (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-709.
---
Resolution: Fixed

Issue resolved by pull request #2580
[https://github.com/apache/incubator-gobblin/pull/2580]

> Provide an option to disallow concurrent flow executions in 
> Gobblin-as-a-Service
> 
>
> Key: GOBBLIN-709
> URL: https://issues.apache.org/jira/browse/GOBBLIN-709
> Project: Apache Gobblin
>  Issue Type: Improvement
>  Components: gobblin-service
>Affects Versions: 0.15.0
>Reporter: Sudarshan Vasudevan
>Assignee: Abhishek Tiwari
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Provide an option to disallow concurrent flow executions in 
> Gobblin-as-a-Service.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (GOBBLIN-713) Lazy load job specification from job catalog to avoid OOM issue when JobCatalog is bootup.

2019-03-27 Thread Hung Tran (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-713.
---
   Resolution: Fixed
Fix Version/s: 0.15.0

Issue resolved by pull request #2581
[https://github.com/apache/incubator-gobblin/pull/2581]

> Lazy load job specification from job catalog to avoid OOM issue when 
> JobCatalog is bootup.
> --
>
> Key: GOBBLIN-713
> URL: https://issues.apache.org/jira/browse/GOBBLIN-713
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Kuai Yu
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Today whenever the job catalog is restarted, all the job specs are load into 
> memory. This can cause OOM issue in our production load. Ticket was created 
> to provide an easy way to load job spec without materializing all the job 
> specs into memory. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (GOBBLIN-690) Relaunch check for the planning job is not correct

2019-02-25 Thread Hung Tran (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-690.
---
   Resolution: Fixed
Fix Version/s: 0.15.0

Issue resolved by pull request #2562
[https://github.com/apache/incubator-gobblin/pull/2562]

> Relaunch check for the planning job is not correct
> --
>
> Key: GOBBLIN-690
> URL: https://issues.apache.org/jira/browse/GOBBLIN-690
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Kuai Yu
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (GOBBLIN-692) Add support to query last K flow executions in Gobblin-as-a-Service (GaaS)

2019-02-27 Thread Hung Tran (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-692.
---
Resolution: Fixed

Issue resolved by pull request #2564
[https://github.com/apache/incubator-gobblin/pull/2564]

> Add support to query last K flow executions in Gobblin-as-a-Service (GaaS)
> --
>
> Key: GOBBLIN-692
> URL: https://issues.apache.org/jira/browse/GOBBLIN-692
> Project: Apache Gobblin
>  Issue Type: Improvement
>  Components: gobblin-service
>Affects Versions: 0.15.0
>Reporter: Sudarshan Vasudevan
>Assignee: Abhishek Tiwari
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently, REST APIs only support retrieving the latest execution of a flow. 
> We enhance the APIs to query the last K executions where K is passed as a 
> query parameter.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GOBBLIN-693) Add ORC hive serde manager

2019-03-01 Thread Hung Tran (JIRA)
Hung Tran created GOBBLIN-693:
-

 Summary: Add ORC hive serde manager
 Key: GOBBLIN-693
 URL: https://issues.apache.org/jira/browse/GOBBLIN-693
 Project: Apache Gobblin
  Issue Type: Task
Reporter: Hung Tran
Assignee: Hung Tran


Add an ORC hive serde manager to register ORC datasets in Hive.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (GOBBLIN-677) Allow for early termination of Gobblin jobs based on a predicate on job progress

2019-03-04 Thread Hung Tran (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-677.
---
   Resolution: Fixed
Fix Version/s: 0.15.0

Issue resolved by pull request #2548
[https://github.com/apache/incubator-gobblin/pull/2548]

> Allow for early termination of Gobblin jobs based on a predicate on job 
> progress
> 
>
> Key: GOBBLIN-677
> URL: https://issues.apache.org/jira/browse/GOBBLIN-677
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Issac Buenrostro
>Assignee: Issac Buenrostro
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (GOBBLIN-684) Ensure buffered messages are flushed before close() in KafkaProducerPusher

2019-02-20 Thread Hung Tran (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-684.
---
Resolution: Fixed

Issue resolved by pull request #2556
[https://github.com/apache/incubator-gobblin/pull/2556]

> Ensure buffered messages are flushed before close() in KafkaProducerPusher
> --
>
> Key: GOBBLIN-684
> URL: https://issues.apache.org/jira/browse/GOBBLIN-684
> Project: Apache Gobblin
>  Issue Type: Improvement
>  Components: gobblin-metrics
>Affects Versions: 0.15.0
>Reporter: Sudarshan Vasudevan
>Assignee: Sudarshan Vasudevan
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Currently, when KafkaProducerPusher is closed, it invokes 
> KafkaProducer#close(). However,close() only guarantees delivery of in-flight 
> messages, not the messages in the producer buffer waiting to be sent out. 
> This results in data loss.
> The fix ensures that we call flush() before close(). As a result, any 
> buffered messages are immediately pushed out and we block until the messages 
> are acked. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (GOBBLIN-669) Configuration Properties Glossary section of Docs hard to read

2019-02-21 Thread Hung Tran (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-669.
---
   Resolution: Fixed
Fix Version/s: 0.15.0

Issue resolved by pull request #2538
[https://github.com/apache/incubator-gobblin/pull/2538]

> Configuration Properties Glossary section of Docs hard to read
> --
>
> Key: GOBBLIN-669
> URL: https://issues.apache.org/jira/browse/GOBBLIN-669
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Christian Soseman
>Priority: Trivial
>  Labels: documentation
> Fix For: 0.15.0
>
>   Original Estimate: 48h
>  Time Spent: 10m
>  Remaining Estimate: 47h 50m
>
> The following section of the documentation is really hard to comb through:
> [https://gobblin.readthedocs.io/en/latest/user-guide/Configuration-Properties-Glossary/]
>  
> I believe that tables would work much easier and make it easier to find 
> properties on this page. The current setup makes it hard to tell where one 
> property starts and the next one stops.
>  
> *I'm currently working on a resolution and will submit the PR for approval. 
> No other resources are required for this particular task outside of review.*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (GOBBLIN-666) Data too long for column 'property_key'

2019-02-21 Thread Hung Tran (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-666.
---
   Resolution: Fixed
Fix Version/s: 0.15.0

Issue resolved by pull request #2539
[https://github.com/apache/incubator-gobblin/pull/2539]

> Data too long for column 'property_key'
> ---
>
> Key: GOBBLIN-666
> URL: https://issues.apache.org/jira/browse/GOBBLIN-666
> Project: Apache Gobblin
>  Issue Type: Bug
>  Components: state-management
>Affects Versions: 0.14.0
>Reporter: Francis Laforge
>Assignee: Hung Tran
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We may have following error :
> {noformat}
> com.mysql.jdbc.MysqlDataTruncation: Data truncation: Data too long for column 
> 'property_key' at row 1
>     at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3876)
>     at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3814)
>     at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2478)
>     at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2625)
>     at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2551)
>     at 
> com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:1861)
>     at 
> com.mysql.jdbc.PreparedStatement.executeUpdateInternal(PreparedStatement.java:2073)
>     at 
> com.mysql.jdbc.PreparedStatement.executeUpdateInternal(PreparedStatement.java:2009)
>     at 
> com.mysql.jdbc.PreparedStatement.executeLargeUpdate(PreparedStatement.java:5094)
>     at 
> com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:1994)
>     at 
> org.apache.commons.dbcp.DelegatingPreparedStatement.executeUpdate(DelegatingPreparedStatement.java:105)
>    at 
> org.apache.commons.dbcp.DelegatingPreparedStatement.executeUpdate(DelegatingPreparedStatement.java:105)
>     at 
> org.apache.gobblin.metastore.database.DatabaseJobHistoryStoreV100.updateProperty(DatabaseJobHistoryStoreV100.java:523)
>     at 
> org.apache.gobblin.metastore.database.DatabaseJobHistoryStoreV100.put(DatabaseJobHistoryStoreV100.java:244)
>     at 
> org.apache.gobblin.metastore.DatabaseJobHistoryStore.put(DatabaseJobHistoryStore.java:77)
>     at 
> org.apache.gobblin.runtime.JobContext.storeJobExecutionInfo(JobContext.java:406)
>     at 
> org.apache.gobblin.runtime.AbstractJobLauncher.launchJob(AbstractJobLauncher.java:490)
>     at 
> org.apache.gobblin.scheduler.JobScheduler.runJob(JobScheduler.java:479)
>     at 
> org.apache.gobblin.scheduler.JobScheduler.runJob(JobScheduler.java:435)
>     at 
> org.apache.gobblin.scheduler.JobScheduler$GobblinJob.executeImpl(JobScheduler.java:598)
>     at 
> org.apache.gobblin.scheduler.BaseGobblinJob.execute(BaseGobblinJob.java:58)
>     at org.quartz.core.JobRunShell.run(JobRunShell.java:202)
>     at 
> org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:573){noformat}
> when property_key is too long. Unfortunately, some keys are automatically 
> generated and can be very long. For example when using parquet with 
> partitionning we may have this type of key : 
> {noformat}
> construct.final.state.FORK_OPERATOR.0.WRITER.RecordsWritten_partition1=val_name_1/partition2=2017-12-08/partition3=8/partition4=22812
> {noformat}
> which is more than 128 char given in the mysql table definition.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (GOBBLIN-682) Create a new constructor for DatasetCleanerJob

2019-02-21 Thread Hung Tran (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-682.
---
Resolution: Fixed

Issue resolved by pull request #2554
[https://github.com/apache/incubator-gobblin/pull/2554]

> Create a new constructor for DatasetCleanerJob
> --
>
> Key: GOBBLIN-682
> URL: https://issues.apache.org/jira/browse/GOBBLIN-682
> Project: Apache Gobblin
>  Issue Type: Improvement
>  Components: gobblin-azkaban
>Affects Versions: 0.15.0
>Reporter: Sudarshan Vasudevan
>Assignee: Sudarshan Vasudevan
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Current DatasetCleanerJob constructor only accepts config passed as 
> azkaban.utils.Props. Here, we implement a new construct that also accepts 
> java.util.Properties.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (GOBBLIN-698) Enhance logging to print job and flow details when a job is orchestrated by GaaS

2019-03-05 Thread Hung Tran (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-698.
---
Resolution: Fixed

Issue resolved by pull request #2569
[https://github.com/apache/incubator-gobblin/pull/2569]

> Enhance logging to print job and flow details when a job is orchestrated by 
> GaaS
> 
>
> Key: GOBBLIN-698
> URL: https://issues.apache.org/jira/browse/GOBBLIN-698
> Project: Apache Gobblin
>  Issue Type: Improvement
>  Components: gobblin-service
>Affects Versions: 0.15.0
>Reporter: Sudarshan Vasudevan
>Assignee: Abhishek Tiwari
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> We enhance logging in GaaS to add job and flow details when a job is 
> orchestrated by GaaS.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (GOBBLIN-691) Make compaction implementation format-insensitive

2019-03-05 Thread Hung Tran (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-691.
---
   Resolution: Fixed
Fix Version/s: 0.15.0

Issue resolved by pull request #2563
[https://github.com/apache/incubator-gobblin/pull/2563]

> Make compaction implementation format-insensitive
> -
>
> Key: GOBBLIN-691
> URL: https://issues.apache.org/jira/browse/GOBBLIN-691
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Lei Sun
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (GOBBLIN-688) Make FsJobStatusRetriever config more scoped

2019-02-20 Thread Hung Tran (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-688.
---
Resolution: Fixed

Issue resolved by pull request #2560
[https://github.com/apache/incubator-gobblin/pull/2560]

> Make FsJobStatusRetriever config more scoped
> 
>
> Key: GOBBLIN-688
> URL: https://issues.apache.org/jira/browse/GOBBLIN-688
> Project: Apache Gobblin
>  Issue Type: Improvement
>  Components: gobblin-service
>Affects Versions: 0.15.0
>Reporter: Sudarshan Vasudevan
>Assignee: Abhishek Tiwari
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The proposed enhancement adds a configuration prefix to FsJobStatusRetriever 
> config to make it more scoped.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (GOBBLIN-683) Azkaban client should retry if session gets expired

2019-02-20 Thread Hung Tran (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-683.
---
   Resolution: Fixed
Fix Version/s: 0.15.0

Issue resolved by pull request #2555
[https://github.com/apache/incubator-gobblin/pull/2555]

> Azkaban client should retry if session gets expired
> ---
>
> Key: GOBBLIN-683
> URL: https://issues.apache.org/jira/browse/GOBBLIN-683
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Kuai Yu
>Assignee: Kuai Yu
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (GOBBLIN-687) Pass TopologySpec map to DagManager to allow reuse of SpecExecutors during DAG deserialization

2019-02-20 Thread Hung Tran (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-687.
---
Resolution: Fixed

Issue resolved by pull request #2559
[https://github.com/apache/incubator-gobblin/pull/2559]

> Pass TopologySpec map to DagManager to allow reuse of SpecExecutors during 
> DAG deserialization
> --
>
> Key: GOBBLIN-687
> URL: https://issues.apache.org/jira/browse/GOBBLIN-687
> Project: Apache Gobblin
>  Issue Type: Improvement
>  Components: gobblin-service
>Affects Versions: 0.15.0
>Reporter: Sudarshan Vasudevan
>Assignee: Abhishek Tiwari
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> DagManager maintains state of all currently executing DAGs, by serializing 
> each DAG on compilation and persisting it to a durable store. The serialized 
> DAG includes Job config as well as the SpecExecutor config for each job in 
> the DAG. This is done to correctly resume execution of DAGs in case of 
> service restarts or leadership change. 
> Currently, on service restart/leadership change, the new master de-serializes 
> SpecExecutor config and creates a SpecExecutor instance for each job in the 
> DAG. If the number of DAGs is large, this can result in many connections to 
> the underlying executor instance. The proposed fix allows the DagManager to 
> re-use the SpecExecutor instances created by the TopologySpecFactory when it 
> deserializes a DAG.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (GOBBLIN-686) handle schema mismatch in compatible schemas

2019-02-20 Thread Hung Tran (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-686.
---
   Resolution: Fixed
Fix Version/s: 0.15.0

Issue resolved by pull request #2558
[https://github.com/apache/incubator-gobblin/pull/2558]

> handle schema mismatch in compatible schemas
> 
>
> Key: GOBBLIN-686
> URL: https://issues.apache.org/jira/browse/GOBBLIN-686
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Arjun Singh Bora
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (GOBBLIN-696) Provide an "explain" option to return a compiled flow when a flow config is added.

2019-03-17 Thread Hung Tran (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-696.
---
Resolution: Fixed

Issue resolved by pull request #2567
[https://github.com/apache/incubator-gobblin/pull/2567]

> Provide an "explain" option to return a compiled flow when a flow config is 
> added.
> --
>
> Key: GOBBLIN-696
> URL: https://issues.apache.org/jira/browse/GOBBLIN-696
> Project: Apache Gobblin
>  Issue Type: Improvement
>  Components: gobblin-service
>Affects Versions: 0.15.0
>Reporter: Sudarshan Vasudevan
>Assignee: Abhishek Tiwari
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> We add support for an "explain" option in Gobblin-as-a-Service (GaaS) flow 
> creation requests to return the expected output of flow compilation. The 
> "explain" option allows end users to validate their FlowConfig requests by 
> ensuring that: 1. the request results in a successful compilation and 2. that 
> the compiled output is as expected. Further, the "explain" option allows 
> users to query GaaS without any side-effects i.e. no FlowSpecs are actually 
> created/scheduled on GaaS.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (GOBBLIN-704) Adding serde props for ORCSerDe initialization

2019-03-20 Thread Hung Tran (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-704.
---
   Resolution: Fixed
Fix Version/s: 0.15.0

Issue resolved by pull request #2574
[https://github.com/apache/incubator-gobblin/pull/2574]

> Adding serde props for ORCSerDe initialization
> --
>
> Key: GOBBLIN-704
> URL: https://issues.apache.org/jira/browse/GOBBLIN-704
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Lei Sun
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (GOBBLIN-689) catch unchecked exceptions in KafkaSource

2019-03-15 Thread Hung Tran (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-689.
---
   Resolution: Fixed
Fix Version/s: 0.15.0

Issue resolved by pull request #2561
[https://github.com/apache/incubator-gobblin/pull/2561]

> catch unchecked exceptions in KafkaSource
> -
>
> Key: GOBBLIN-689
> URL: https://issues.apache.org/jira/browse/GOBBLIN-689
> Project: Apache Gobblin
>  Issue Type: Bug
>Reporter: Arjun Singh Bora
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (GOBBLIN-702) Fix bug by reusable OrcStruct

2019-03-15 Thread Hung Tran (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-702.
---
   Resolution: Fixed
Fix Version/s: 0.15.0

Issue resolved by pull request #2572
[https://github.com/apache/incubator-gobblin/pull/2572]

> Fix bug by reusable OrcStruct 
> --
>
> Key: GOBBLIN-702
> URL: https://issues.apache.org/jira/browse/GOBBLIN-702
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Lei Sun
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (GOBBLIN-697) Allow distcp to carry over file version independently of modtime

2019-03-16 Thread Hung Tran (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-697.
---
   Resolution: Fixed
Fix Version/s: 0.15.0

Issue resolved by pull request #2568
[https://github.com/apache/incubator-gobblin/pull/2568]

> Allow distcp to carry over file version independently of modtime
> 
>
> Key: GOBBLIN-697
> URL: https://issues.apache.org/jira/browse/GOBBLIN-697
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Issac Buenrostro
>Assignee: Issac Buenrostro
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Examples where this might be useful is data syncing between two locations. 
> Relying on modification times to detect data changes may lead to a feedback 
> loop of copying: data gets created at location A at time 0, at time 1 data is 
> copied to location B, sync mechanism might incorrectly believe that since mod 
> time of location B is higher, it should be synced back to location A, etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (GOBBLIN-695) Add tools for generating binary files in avro/orc using json

2019-03-07 Thread Hung Tran (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-695.
---
   Resolution: Fixed
Fix Version/s: 0.15.0

Issue resolved by pull request #2566
[https://github.com/apache/incubator-gobblin/pull/2566]

> Add tools for generating binary files in avro/orc using json
> 
>
> Key: GOBBLIN-695
> URL: https://issues.apache.org/jira/browse/GOBBLIN-695
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Lei Sun
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> port from internal product that uses gobblin.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (GOBBLIN-705) Create method to merge table props from existing hive meta table

2019-03-21 Thread Hung Tran (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-705.
---
   Resolution: Fixed
Fix Version/s: 0.15.0

Issue resolved by pull request #2575
[https://github.com/apache/incubator-gobblin/pull/2575]

> Create method to merge table props from existing hive meta table
> 
>
> Key: GOBBLIN-705
> URL: https://issues.apache.org/jira/browse/GOBBLIN-705
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Lei Sun
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (GOBBLIN-680) Enhance error handling on task creation

2019-02-08 Thread Hung Tran (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-680.
---
   Resolution: Fixed
Fix Version/s: 0.15.0

Issue resolved by pull request #2551
[https://github.com/apache/incubator-gobblin/pull/2551]

> Enhance error handling on task creation 
> 
>
> Key: GOBBLIN-680
> URL: https://issues.apache.org/jira/browse/GOBBLIN-680
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Lei Sun
>Assignee: Lei Sun
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (GOBBLIN-681) increase max allowed size of a job name

2019-02-08 Thread Hung Tran (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-681.
---
   Resolution: Fixed
Fix Version/s: 0.15.0

Issue resolved by pull request #2552
[https://github.com/apache/incubator-gobblin/pull/2552]

> increase max allowed size of a job name
> ---
>
> Key: GOBBLIN-681
> URL: https://issues.apache.org/jira/browse/GOBBLIN-681
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Arjun Singh Bora
>Assignee: Arjun Singh Bora
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (GOBBLIN-679) Refactor cluster task metrics

2019-02-11 Thread Hung Tran (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-679.
---
   Resolution: Fixed
Fix Version/s: 0.15.0

Issue resolved by pull request #2553
[https://github.com/apache/incubator-gobblin/pull/2553]

> Refactor cluster task metrics
> -
>
> Key: GOBBLIN-679
> URL: https://issues.apache.org/jira/browse/GOBBLIN-679
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Kuai Yu
>Assignee: Kuai Yu
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (GOBBLIN-685) Add jstack when timeout happens in EmbeddedGobblin

2019-02-15 Thread Hung Tran (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-685.
---
   Resolution: Fixed
Fix Version/s: 0.15.0

Issue resolved by pull request #2557
[https://github.com/apache/incubator-gobblin/pull/2557]

> Add jstack when timeout happens in EmbeddedGobblin
> --
>
> Key: GOBBLIN-685
> URL: https://issues.apache.org/jira/browse/GOBBLIN-685
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Kuai Yu
>Assignee: Kuai Yu
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (GOBBLIN-676) Add record metadata support to the RecordEnvelope

2019-02-05 Thread Hung Tran (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-676.
---
   Resolution: Fixed
Fix Version/s: 0.15.0

Issue resolved by pull request #2546
[https://github.com/apache/incubator-gobblin/pull/2546]

> Add record metadata support to the RecordEnvelope
> -
>
> Key: GOBBLIN-676
> URL: https://issues.apache.org/jira/browse/GOBBLIN-676
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Hung Tran
>Assignee: Hung Tran
>Priority: Major
> Fix For: 0.15.0
>
>
> The RecordEnvelope currently only has a watermark. Add a Map to it to store 
> record-level metadata.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (GOBBLIN-671) Close the underlying writer when a HiveWritableHdfsDataWriter is closed

2019-01-25 Thread Hung Tran (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-671.
---
   Resolution: Fixed
Fix Version/s: 0.15.0

Issue resolved by pull request #2541
[https://github.com/apache/incubator-gobblin/pull/2541]

> Close the underlying writer when a HiveWritableHdfsDataWriter is closed
> ---
>
> Key: GOBBLIN-671
> URL: https://issues.apache.org/jira/browse/GOBBLIN-671
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Hung Tran
>Assignee: Hung Tran
>Priority: Major
> Fix For: 0.15.0
>
>
> The HiveWritableHdfsDataWriter writer does not close the underlying writer 
> when close() is called. This results in holding onto writer resources after 
> the close. For some underlying writers like an OrcRecordWriter this case 
> result a large amount of memory being buffered which leads to OOMs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (GOBBLIN-628) Zuora Source

2019-02-01 Thread Hung Tran (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-628.
---
   Resolution: Fixed
Fix Version/s: 0.15.0

Issue resolved by pull request #2498
[https://github.com/apache/incubator-gobblin/pull/2498]

> Zuora Source
> 
>
> Key: GOBBLIN-628
> URL: https://issues.apache.org/jira/browse/GOBBLIN-628
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Abhishek Tiwari
>Assignee: Abhishek Tiwari
>Priority: Major
> Fix For: 0.15.0
>
>
> Zuora Source connector.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (GOBBLIN-673) Implement a FS based JobStatusRetriever for GaaS Flows.

2019-02-05 Thread Hung Tran (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-673.
---
Resolution: Fixed

Issue resolved by pull request #2545
[https://github.com/apache/incubator-gobblin/pull/2545]

> Implement a FS based JobStatusRetriever for GaaS Flows.
> ---
>
> Key: GOBBLIN-673
> URL: https://issues.apache.org/jira/browse/GOBBLIN-673
> Project: Apache Gobblin
>  Issue Type: Improvement
>  Components: gobblin-service
>Affects Versions: 0.15.0
>Reporter: Sudarshan Vasudevan
>Assignee: Sudarshan Vasudevan
>Priority: Major
> Fix For: 0.15.0
>
>
> This PR implements a FileSystem based JobStatusRetriever that makes use of 
> the StateStore interface. The PR also implements a KafkaJobStatusMonitor that 
> pulls tracking events from Kafka and writes them to an FSStateStore. The 
> FSJobStatusRetriever can then be used to query the status of jobs/flows from 
> the state store. A StateStoreCleaner thread is scheduled by the Job status 
> monitor to clean up the state store as configured by the retention config. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (GOBBLIN-675) Enhance FSDatasetDescriptor definition to include partition config, encryption level and compaction config.

2019-02-05 Thread Hung Tran (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-675.
---
Resolution: Fixed

Issue resolved by pull request #2544
[https://github.com/apache/incubator-gobblin/pull/2544]

> Enhance FSDatasetDescriptor definition to include partition config, 
> encryption level and compaction config.
> ---
>
> Key: GOBBLIN-675
> URL: https://issues.apache.org/jira/browse/GOBBLIN-675
> Project: Apache Gobblin
>  Issue Type: Improvement
>  Components: gobblin-service
>Affects Versions: 0.15.0
>Reporter: Sudarshan Vasudevan
>Assignee: Sudarshan Vasudevan
>Priority: Major
> Fix For: 0.15.0
>
>
> Enhance FSDatasetDescriptor definition to include
>  # partition configuration of dataset (e.g. datetime, regex etc.)
>  # Add config for encryption level (e.g. file, row, field), and
>  # Add compaction config (e.g. plain compaction, compaction with de-dup).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GOBBLIN-676) Add record metadata support to the RecordEnvelope

2019-02-05 Thread Hung Tran (JIRA)
Hung Tran created GOBBLIN-676:
-

 Summary: Add record metadata support to the RecordEnvelope
 Key: GOBBLIN-676
 URL: https://issues.apache.org/jira/browse/GOBBLIN-676
 Project: Apache Gobblin
  Issue Type: Task
Reporter: Hung Tran
Assignee: Hung Tran


The RecordEnvelope currently only has a watermark. Add a Map to it to store 
record-level metadata.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (GOBBLIN-727) Skip commit in CloseOnFlushWriterWrapper if a commit has already been invoked on the underlying writer.

2019-04-08 Thread Hung Tran (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-727.
---
Resolution: Fixed

Issue resolved by pull request #2594
[https://github.com/apache/incubator-gobblin/pull/2594]

> Skip commit in CloseOnFlushWriterWrapper if a commit has already been invoked 
> on the underlying writer.
> ---
>
> Key: GOBBLIN-727
> URL: https://issues.apache.org/jira/browse/GOBBLIN-727
> Project: Apache Gobblin
>  Issue Type: Bug
>  Components: gobblin-core
>Affects Versions: 0.15.0
>Reporter: Sudarshan Vasudevan
>Assignee: Abhishek Tiwari
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> We skip commit() on the underlying writer if a commit has been previously 
> invoked. Currently, duplicate commits on the underlying writer can result in 
> Exceptions due to non-existing data in task staging location (as in the case 
> of an FsDataWriter). 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GOBBLIN-737) Add support for Helix quota-based task scheduling

2019-04-14 Thread Hung Tran (JIRA)
Hung Tran created GOBBLIN-737:
-

 Summary: Add support for Helix quota-based task scheduling
 Key: GOBBLIN-737
 URL: https://issues.apache.org/jira/browse/GOBBLIN-737
 Project: Apache Gobblin
  Issue Type: Task
Reporter: Hung Tran
Assignee: Hung Tran


Support configuring Helix quota-based task scheduling through gobblin cluster 
configuration. The gobblin cluster config key 
"gobblin.cluster.helixTaskQuotaConfig" is added to store a value in the format 
"quota_type1:quota_value1,quota_type2:quota_value2,...". The config values are 
parsed and propagated to the Helix cluster configuration.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GOBBLIN-736) Skip flush and control message handlers on closed writers in the CloseOnFlushWriterWrapper

2019-04-14 Thread Hung Tran (JIRA)
Hung Tran created GOBBLIN-736:
-

 Summary: Skip flush and control message handlers on closed writers 
in the CloseOnFlushWriterWrapper
 Key: GOBBLIN-736
 URL: https://issues.apache.org/jira/browse/GOBBLIN-736
 Project: Apache Gobblin
  Issue Type: Task
Reporter: Hung Tran
Assignee: Hung Tran


The CloseOnFlushWriterWrapper should not operate on the underlying writer after 
it is closed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (GOBBLIN-737) Add support for Helix quota-based task scheduling

2019-04-15 Thread Hung Tran (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-737.
---
   Resolution: Fixed
Fix Version/s: 0.15.0

Issue resolved by pull request #2604
[https://github.com/apache/incubator-gobblin/pull/2604]

> Add support for Helix quota-based task scheduling
> -
>
> Key: GOBBLIN-737
> URL: https://issues.apache.org/jira/browse/GOBBLIN-737
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Hung Tran
>Assignee: Hung Tran
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Support configuring Helix quota-based task scheduling through gobblin cluster 
> configuration. The gobblin cluster config key 
> "gobblin.cluster.helixTaskQuotaConfig" is added to store a value in the 
> format "quota_type1:quota_value1,quota_type2:quota_value2,...". The config 
> values are parsed and propagated to the Helix cluster configuration.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (GOBBLIN-739) Add a way to propagate the Azkaban job config to Gobblin on YARN

2019-04-16 Thread Hung Tran (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-739.
---
   Resolution: Fixed
Fix Version/s: 0.15.0

Issue resolved by pull request #2606
[https://github.com/apache/incubator-gobblin/pull/2606]

> Add a way to propagate the Azkaban job config to Gobblin on YARN
> 
>
> Key: GOBBLIN-739
> URL: https://issues.apache.org/jira/browse/GOBBLIN-739
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Hung Tran
>Assignee: Hung Tran
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The AzkabanGobblinYarnAppLauncher can be used to launch a Gobblin application 
> master on YARN, which then loads configuration from an application.conf file. 
> Currently, the application.conf is pre-generated and packaged with the 
> Azkaban job zip. This results in duplication of config between the Azkaban 
> job properties and the application.conf file. It also doesn't allow user 
> overrides in the Azkaban UI to be propagated to the app master and containers.
> A config should be added to specify an output path to write the Azkaban job 
> config to in HOCON format. The gobblin yarn config such as 
> gobblin.yarn.app.master.files.local and gobblin.yarn.container.files.local 
> can be set to point to the output file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GOBBLIN-732) Pass UGI credentials to the app master and load dynamic config in workers

2019-04-11 Thread Hung Tran (JIRA)
Hung Tran created GOBBLIN-732:
-

 Summary: Pass UGI credentials to the app master and load dynamic 
config in workers
 Key: GOBBLIN-732
 URL: https://issues.apache.org/jira/browse/GOBBLIN-732
 Project: Apache Gobblin
  Issue Type: Task
Reporter: Hung Tran
Assignee: Hung Tran


Credentials available in the Azkaban application launcher need to be passed to 
the Gobblin application master for distribution to the workers. The workers 
also need to load dynamic config based on the credentials.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (GOBBLIN-729) Add version strategy support for HiveDataset copy

2019-04-12 Thread Hung Tran (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-729.
---
   Resolution: Fixed
Fix Version/s: 0.15.0

Issue resolved by pull request #2596
[https://github.com/apache/incubator-gobblin/pull/2596]

> Add version strategy support for HiveDataset copy
> -
>
> Key: GOBBLIN-729
> URL: https://issues.apache.org/jira/browse/GOBBLIN-729
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Kuai Yu
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> This PR will add data strategy support for Hive dataset copy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (GOBBLIN-733) Instrument Avro Converters to allow converter metrics emission in both batch and streaming modes

2019-04-12 Thread Hung Tran (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-733.
---
Resolution: Fixed

Issue resolved by pull request #2600
[https://github.com/apache/incubator-gobblin/pull/2600]

> Instrument Avro Converters to allow converter metrics emission in both batch 
> and streaming modes
> 
>
> Key: GOBBLIN-733
> URL: https://issues.apache.org/jira/browse/GOBBLIN-733
> Project: Apache Gobblin
>  Issue Type: Improvement
>  Components: gobblin-core
>Affects Versions: 0.15.0
>Reporter: Sudarshan Vasudevan
>Assignee: Abhishek Tiwari
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> We change Avro converters to extend InstrumentedConverters which will allow 
> converter metrics to be emitted in both batch and streaming modes of 
> execution. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (GOBBLIN-732) Pass UGI credentials to the app master and load dynamic config in workers

2019-04-11 Thread Hung Tran (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-732.
---
   Resolution: Fixed
Fix Version/s: 0.15.0

Issue resolved by pull request #2599
[https://github.com/apache/incubator-gobblin/pull/2599]

> Pass UGI credentials to the app master and load dynamic config in workers
> -
>
> Key: GOBBLIN-732
> URL: https://issues.apache.org/jira/browse/GOBBLIN-732
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Hung Tran
>Assignee: Hung Tran
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Credentials available in the Azkaban application launcher need to be passed 
> to the Gobblin application master for distribution to the workers. The 
> workers also need to load dynamic config based on the credentials.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (GOBBLIN-770) Add JVM configuration to avoid exhausting YARN container memory

2019-05-15 Thread Hung Tran (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-770.
---
   Resolution: Fixed
Fix Version/s: 0.15.0

Issue resolved by pull request #2634
[https://github.com/apache/incubator-gobblin/pull/2634]

> Add JVM configuration to avoid exhausting YARN container memory 
> 
>
> Key: GOBBLIN-770
> URL: https://issues.apache.org/jira/browse/GOBBLIN-770
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Hung Tran
>Assignee: Hung Tran
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The current code sets Xmx to the value of the YARN container memory limit. 
> The JVM is highly likely to hit the container memory limit with this 
> configuration due to overhead costs that are not in the JVM heap.
> Configuration should be added to set JVM memory as a percentage of the 
> container memory minus a configurable overhead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (GOBBLIN-791) Fix hanging stream on error in asynchronous execution model

2019-06-03 Thread Hung Tran (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-791.
---
   Resolution: Fixed
Fix Version/s: 0.15.0

Issue resolved by pull request #2659
[https://github.com/apache/incubator-gobblin/pull/2659]

> Fix hanging stream on error in asynchronous execution model
> ---
>
> Key: GOBBLIN-791
> URL: https://issues.apache.org/jira/browse/GOBBLIN-791
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Hung Tran
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The asynchronous task execution model uses ReactiveX streams with a 
> ConnectableFlowable. This is  a hot flowable, so it does not terminate when 
> all subscribers have exited. This results in the extractor continuing to emit 
> records after downstream constructs have exited due to an error. This is very 
> problematic for extractors that introduce waits on control message acks since 
> the extractor may hang.
> Another issue is the errors do not propagate upwards, so errors in the writer 
> do not fail the fork. Change the state of the fork onCancel() to a failure 
> state so that the task gets failed.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (GOBBLIN-783) Fix the double referencing issue for job type config

2019-05-28 Thread Hung Tran (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-783.
---
   Resolution: Fixed
Fix Version/s: 0.15.0

Issue resolved by pull request #2646
[https://github.com/apache/incubator-gobblin/pull/2646]

> Fix the double referencing issue for job type config
> 
>
> Key: GOBBLIN-783
> URL: https://issues.apache.org/jira/browse/GOBBLIN-783
> Project: Apache Gobblin
>  Issue Type: Bug
>Reporter: Kuai Yu
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (GOBBLIN-766) Emit Workunits created event in Apache gobblin

2019-05-28 Thread Hung Tran (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-766.
---
   Resolution: Fixed
Fix Version/s: 0.15.0

Issue resolved by pull request #2636
[https://github.com/apache/incubator-gobblin/pull/2636]

> Emit  Workunits created  event  in Apache gobblin
> -
>
> Key: GOBBLIN-766
> URL: https://issues.apache.org/jira/browse/GOBBLIN-766
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: kraman
>Priority: Minor
> Fix For: 0.15.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Emit a new workunits created metric to be captured for monitoring/Alerting



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GOBBLIN-787) Add an option to include the task start time in the output file name

2019-05-28 Thread Hung Tran (JIRA)
Hung Tran created GOBBLIN-787:
-

 Summary: Add an option to include the task start time in the 
output file name
 Key: GOBBLIN-787
 URL: https://issues.apache.org/jira/browse/GOBBLIN-787
 Project: Apache Gobblin
  Issue Type: Task
Reporter: Hung Tran
Assignee: Hung Tran


In some cases a task may be scheduled to run on multiple workers. One case 
where this happens is when running with the Helix task execution framework. 
Helix may reschedule a task on a different worker if it loses contact with a 
worker. That worker may continue executing for some time before the task is 
terminated. During this period if the output file names collide then there may 
be an error during data publish.

Add an option "writer.addTaskTimestamp" that can be used to reduce the chance 
of name collisions by appending a task startup timestamp to the file name.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GOBBLIN-791) Fix hanging stream on error in asynchronous execution model

2019-05-31 Thread Hung Tran (JIRA)
Hung Tran created GOBBLIN-791:
-

 Summary: Fix hanging stream on error in asynchronous execution 
model
 Key: GOBBLIN-791
 URL: https://issues.apache.org/jira/browse/GOBBLIN-791
 Project: Apache Gobblin
  Issue Type: Task
Reporter: Hung Tran


The asynchronous task execution model uses ReactiveX streams with a 
ConnectableFlowable. This is  a hot flowable, so it does not terminate when all 
subscribers have exited. This results in the extractor continuing to emit 
records after downstream constructs have exited due to an error. This is very 
problematic for extractors that introduce waits on control message acks since 
the extractor may hang.

Another issue is the errors do not propagate upwards, so errors in the writer 
do not fail the fork. Change the state of the fork onCancel() to a failure 
state so that the task gets failed.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GOBBLIN-798) Cleanup workflows from Helix when the Gobblin application master starts

2019-06-06 Thread Hung Tran (JIRA)
Hung Tran created GOBBLIN-798:
-

 Summary: Cleanup workflows from Helix when the Gobblin application 
master starts
 Key: GOBBLIN-798
 URL: https://issues.apache.org/jira/browse/GOBBLIN-798
 Project: Apache Gobblin
  Issue Type: Task
Reporter: Hung Tran
Assignee: Hung Tran


If the application master aborts a new one may be spawned by YARN. The second 
application master will resubmit the jobs. This results in duplicate jobs in 
Helix and multiple instances of the job may run, resulting in duplicate data.

The Gobblin application master should clean up all workflows on startup to 
avoid executing multiple instances of a job.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (GOBBLIN-798) Clean up workflows from Helix when the Gobblin application master starts

2019-06-06 Thread Hung Tran (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran updated GOBBLIN-798:
--
Summary: Clean up workflows from Helix when the Gobblin application master 
starts  (was: Cleanup workflows from Helix when the Gobblin application master 
starts)

> Clean up workflows from Helix when the Gobblin application master starts
> 
>
> Key: GOBBLIN-798
> URL: https://issues.apache.org/jira/browse/GOBBLIN-798
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Hung Tran
>Assignee: Hung Tran
>Priority: Major
>
> If the application master aborts a new one may be spawned by YARN. The second 
> application master will resubmit the jobs. This results in duplicate jobs in 
> Helix and multiple instances of the job may run, resulting in duplicate data.
> The Gobblin application master should clean up all workflows on startup to 
> avoid executing multiple instances of a job.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (GOBBLIN-780) Handle scenarios that cause the YarnAutoScalingManager to be stuck

2019-05-28 Thread Hung Tran (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-780.
---
   Resolution: Fixed
Fix Version/s: 0.15.0

Issue resolved by pull request #2644
[https://github.com/apache/incubator-gobblin/pull/2644]

> Handle scenarios that cause the YarnAutoScalingManager to be stuck
> --
>
> Key: GOBBLIN-780
> URL: https://issues.apache.org/jira/browse/GOBBLIN-780
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Hung Tran
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Issue 1: The YarnAutoScalingRunnable is run in a fixed schedule by a 
> ScheduledExecutorService in YarnAutoScalingManager. If the runnable 
> encounters an exception the the executor service will stop scheduling it. 
> Catch all exceptions in the runnable, log, and do not re-raise.
> Issue 2: The auto scaler may reduce the container count to 0. Helix will not 
> schedule any flows if there are no participants connected. This results in 
> the auto scaler keeping the container count at 0 and no progress is made. Fix 
> this by not allowing the container count to be reduced below 1.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (GOBBLIN-787) Add an option to include the task start time in the output file name

2019-05-29 Thread Hung Tran (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-787.
---
   Resolution: Fixed
Fix Version/s: 0.15.0

Issue resolved by pull request #2653
[https://github.com/apache/incubator-gobblin/pull/2653]

> Add an option to include the task start time in the output file name
> 
>
> Key: GOBBLIN-787
> URL: https://issues.apache.org/jira/browse/GOBBLIN-787
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Hung Tran
>Assignee: Hung Tran
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> In some cases a task may be scheduled to run on multiple workers. One case 
> where this happens is when running with the Helix task execution framework. 
> Helix may reschedule a task on a different worker if it loses contact with a 
> worker. That worker may continue executing for some time before the task is 
> terminated. During this period if the output file names collide then there 
> may be an error during data publish.
> Add an option "writer.addTaskTimestamp" that can be used to reduce the chance 
> of name collisions by appending a task startup timestamp to the file name.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (GOBBLIN-800) Remove the metric context cache from GobblinMetricsRegistry

2019-06-14 Thread Hung Tran (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-800.
---
   Resolution: Fixed
Fix Version/s: 0.15.0

Issue resolved by pull request #2667
[https://github.com/apache/incubator-gobblin/pull/2667]

> Remove the metric context cache from GobblinMetricsRegistry
> ---
>
> Key: GOBBLIN-800
> URL: https://issues.apache.org/jira/browse/GOBBLIN-800
> Project: Apache Gobblin
>  Issue Type: Bug
>Reporter: Kuai Yu
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Remove the metric context cache from GobblinMetricsRegistry



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (GOBBLIN-813) Make SFDC connector support encrypted Salesforce client id and client secret

2019-06-25 Thread Hung Tran (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-813.
---
   Resolution: Fixed
Fix Version/s: 0.15.0

Issue resolved by pull request #2677
[https://github.com/apache/incubator-gobblin/pull/2677]

> Make SFDC connector support encrypted Salesforce client id and client secret
> 
>
> Key: GOBBLIN-813
> URL: https://issues.apache.org/jira/browse/GOBBLIN-813
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Kuai Yu
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (GOBBLIN-799) Bugs in AvroSchemaCheckDefaultStrategy that not return after check ENUM and FIXED

2019-06-17 Thread Hung Tran (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-799.
---
   Resolution: Fixed
Fix Version/s: 0.15.0

Issue resolved by pull request #2666
[https://github.com/apache/incubator-gobblin/pull/2666]

> Bugs in  AvroSchemaCheckDefaultStrategy that not return after check ENUM and 
> FIXED
> --
>
> Key: GOBBLIN-799
> URL: https://issues.apache.org/jira/browse/GOBBLIN-799
> Project: Apache Gobblin
>  Issue Type: Bug
>Reporter: Zihan Li
>Priority: Minor
> Fix For: 0.15.0
>
>
> There are bugs in  AvroSchemaCheckDefaultStrategy that not return after check 
> ENUM and FIXED, just need to add return statement



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (GOBBLIN-798) Clean up workflows from Helix when the Gobblin application master starts

2019-06-10 Thread Hung Tran (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-798.
---
   Resolution: Fixed
Fix Version/s: 0.15.0

Issue resolved by pull request #2665
[https://github.com/apache/incubator-gobblin/pull/2665]

> Clean up workflows from Helix when the Gobblin application master starts
> 
>
> Key: GOBBLIN-798
> URL: https://issues.apache.org/jira/browse/GOBBLIN-798
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Hung Tran
>Assignee: Hung Tran
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> If the application master aborts a new one may be spawned by YARN. The second 
> application master will resubmit the jobs. This results in duplicate jobs in 
> Helix and multiple instances of the job may run, resulting in duplicate data.
> The Gobblin application master should clean up all workflows on startup to 
> avoid executing multiple instances of a job.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (GOBBLIN-767) Support different time units in TimeBasedWriterPartitioner

2019-05-10 Thread Hung Tran (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-767.
---
   Resolution: Fixed
Fix Version/s: 0.15.0

Issue resolved by pull request #2630
[https://github.com/apache/incubator-gobblin/pull/2630]

> Support different time units in TimeBasedWriterPartitioner
> --
>
> Key: GOBBLIN-767
> URL: https://issues.apache.org/jira/browse/GOBBLIN-767
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Zhixiong Chen
>Assignee: Zhixiong Chen
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Currently, `TimeBasedWriterPartitioner` assumes the timestamp value from a 
> record is in millis. The task is to remove the assumption and support 
> timestamp in different units, by default, in millis.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (GOBBLIN-769) Support string record timestamp in TimeBasedAvroWriterPartitioner

2019-05-14 Thread Hung Tran (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-769.
---
   Resolution: Fixed
Fix Version/s: 0.15.0

Issue resolved by pull request #2632
[https://github.com/apache/incubator-gobblin/pull/2632]

> Support string record timestamp in TimeBasedAvroWriterPartitioner
> -
>
> Key: GOBBLIN-769
> URL: https://issues.apache.org/jira/browse/GOBBLIN-769
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Zhixiong Chen
>Assignee: Zhixiong Chen
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Currently, if a record timestamp is a string, 
> `TimeBasedAvroWriterPartitioner` will not be able to recognize it and will 
> use current time



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GOBBLIN-770) Add JVM configuration to avoid exhausting YARN container memory

2019-05-14 Thread Hung Tran (JIRA)
Hung Tran created GOBBLIN-770:
-

 Summary: Add JVM configuration to avoid exhausting YARN container 
memory 
 Key: GOBBLIN-770
 URL: https://issues.apache.org/jira/browse/GOBBLIN-770
 Project: Apache Gobblin
  Issue Type: Task
Reporter: Hung Tran


The current code sets Xmx to the value of the YARN container memory limit. The 
JVM is highly likely to hit the container memory limit with this configuration 
due to overhead costs that are not in the JVM heap.

Configuration should be added to set JVM memory as a percentage of the 
container memory minus a configurable overhead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (GOBBLIN-770) Add JVM configuration to avoid exhausting YARN container memory

2019-05-14 Thread Hung Tran (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran reassigned GOBBLIN-770:
-

Assignee: Hung Tran

> Add JVM configuration to avoid exhausting YARN container memory 
> 
>
> Key: GOBBLIN-770
> URL: https://issues.apache.org/jira/browse/GOBBLIN-770
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Hung Tran
>Assignee: Hung Tran
>Priority: Major
>
> The current code sets Xmx to the value of the YARN container memory limit. 
> The JVM is highly likely to hit the container memory limit with this 
> configuration due to overhead costs that are not in the JVM heap.
> Configuration should be added to set JVM memory as a percentage of the 
> container memory minus a configurable overhead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GOBBLIN-777) Remove container request after container allocation

2019-05-21 Thread Hung Tran (JIRA)
Hung Tran created GOBBLIN-777:
-

 Summary: Remove container request after container allocation
 Key: GOBBLIN-777
 URL: https://issues.apache.org/jira/browse/GOBBLIN-777
 Project: Apache Gobblin
  Issue Type: Task
Reporter: Hung Tran
Assignee: Hung Tran


Due to YARN-1902, a request for containers may allocate more containers than 
desired since the requests are not automatically removed when a container is 
allocated.

The Gobblin YarnService needs to work around this issue by removing a matching 
container request in the container allocation callback.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GOBBLIN-780) Handle scenarios that causes the YarnAutoScalingManager to be stuck

2019-05-23 Thread Hung Tran (JIRA)
Hung Tran created GOBBLIN-780:
-

 Summary: Handle scenarios that causes the YarnAutoScalingManager 
to be stuck
 Key: GOBBLIN-780
 URL: https://issues.apache.org/jira/browse/GOBBLIN-780
 Project: Apache Gobblin
  Issue Type: Task
Reporter: Hung Tran


Issue 1: The YarnAutoScalingRunnable is run in a fixed schedule by a 
ScheduledExecutorService in YarnAutoScalingManager. If the runnable encounters 
an exception the the executor service will stop scheduling it. Catch all 
exceptions in the runnable, log, and do not re-raise.

Issue 2: The auto scaler may reduce the container count to 0. Helix will not 
schedule any flows if there are no participants connected. This results in the 
auto scaler keeping the container count at 0 and no progress is made. Fix this 
by not allowing the container count to be reduced below 1.

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (GOBBLIN-780) Handle scenarios that cause the YarnAutoScalingManager to be stuck

2019-05-23 Thread Hung Tran (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran updated GOBBLIN-780:
--
Summary: Handle scenarios that cause the YarnAutoScalingManager to be stuck 
 (was: Handle scenarios that causes the YarnAutoScalingManager to be stuck)

> Handle scenarios that cause the YarnAutoScalingManager to be stuck
> --
>
> Key: GOBBLIN-780
> URL: https://issues.apache.org/jira/browse/GOBBLIN-780
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Hung Tran
>Priority: Major
>
> Issue 1: The YarnAutoScalingRunnable is run in a fixed schedule by a 
> ScheduledExecutorService in YarnAutoScalingManager. If the runnable 
> encounters an exception the the executor service will stop scheduling it. 
> Catch all exceptions in the runnable, log, and do not re-raise.
> Issue 2: The auto scaler may reduce the container count to 0. Helix will not 
> schedule any flows if there are no participants connected. This results in 
> the auto scaler keeping the container count at 0 and no progress is made. Fix 
> this by not allowing the container count to be reduced below 1.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (GOBBLIN-763) Support fields removal for compaction dedup key schema

2019-05-08 Thread Hung Tran (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-763.
---
   Resolution: Fixed
Fix Version/s: 0.15.0

Issue resolved by pull request #2627
[https://github.com/apache/incubator-gobblin/pull/2627]

> Support fields removal for compaction dedup key schema
> --
>
> Key: GOBBLIN-763
> URL: https://issues.apache.org/jira/browse/GOBBLIN-763
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Zhixiong Chen
>Assignee: Zhixiong Chen
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> - Remove fields, specified by configuration 
> `compaction.job.key.fieldBlacklist`, while computing compaction dedup key 
> schema
> - Fix incorrect `AvroUtils.removeUncomparableFields` implementation, which 
> only keeps the first field of any schema, dropping all other fields which 
> have the same schema. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (GOBBLIN-777) Remove container request after container allocation

2019-05-21 Thread Hung Tran (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-777.
---
   Resolution: Fixed
Fix Version/s: 0.15.0

Issue resolved by pull request #2641
[https://github.com/apache/incubator-gobblin/pull/2641]

> Remove container request after container allocation
> ---
>
> Key: GOBBLIN-777
> URL: https://issues.apache.org/jira/browse/GOBBLIN-777
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Hung Tran
>Assignee: Hung Tran
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Due to YARN-1902, a request for containers may allocate more containers than 
> desired since the requests are not automatically removed when a container is 
> allocated.
> The Gobblin YarnService needs to work around this issue by removing a 
> matching container request in the container allocation callback.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (GOBBLIN-762) Add automatic scaling for Gobblin on YARN

2019-05-14 Thread Hung Tran (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-762.
---
   Resolution: Fixed
Fix Version/s: 0.15.0

Issue resolved by pull request #2626
[https://github.com/apache/incubator-gobblin/pull/2626]

> Add automatic scaling for Gobblin on YARN
> -
>
> Key: GOBBLIN-762
> URL: https://issues.apache.org/jira/browse/GOBBLIN-762
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Hung Tran
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> Gobblin on YARN needs a way to scale up and down the containers based on the 
> workload.
> Added `YarnAutoScalingManager` which can be started by the 
> `GobblinApplicationMaster` by setting the 
> `gobblin.yarn.app.master.serviceClasses` configuration. This class runs a 
> scheduled task with a default interval of 60 seconds to detect the number of 
> required partitions for the workflows submitted to Helix. It will request the 
> `YarnService` to scale to a computed number of containers. If the requested 
> number of containers is higher than the YarnService has previously requested 
> then it will request more containers. If the requested count is less than the 
> current number of allocated containers then it will free any unused 
> containers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GOBBLIN-774) Send nack when a control message handler fails in Fork

2019-05-20 Thread Hung Tran (JIRA)
Hung Tran created GOBBLIN-774:
-

 Summary: Send nack when a control message handler fails in Fork
 Key: GOBBLIN-774
 URL: https://issues.apache.org/jira/browse/GOBBLIN-774
 Project: Apache Gobblin
  Issue Type: Task
Reporter: Hung Tran
Assignee: Hung Tran


Fork will raise an error without ack/nacking if the control message handler 
raises an error. This can result in another thread waiting indefinitely for a 
control message ack. Fork.

consumeRecordStream() should handle control message exceptions by calling 
nack() with the exception before reraising the error.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (GOBBLIN-774) Send nack when a control message handler fails in Fork

2019-05-20 Thread Hung Tran (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-774.
---
   Resolution: Fixed
Fix Version/s: 0.15.0

Issue resolved by pull request #2639
[https://github.com/apache/incubator-gobblin/pull/2639]

> Send nack when a control message handler fails in Fork
> --
>
> Key: GOBBLIN-774
> URL: https://issues.apache.org/jira/browse/GOBBLIN-774
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Hung Tran
>Assignee: Hung Tran
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Fork will raise an error without ack/nacking if the control message handler 
> raises an error. This can result in another thread waiting indefinitely for a 
> control message ack. Fork.
> consumeRecordStream() should handle control message exceptions by calling 
> nack() with the exception before reraising the error.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (GOBBLIN-761) Fix runtime property like Topic.name not available in Compaction when fetching configStore object

2019-05-01 Thread Hung Tran (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-761.
---
   Resolution: Fixed
Fix Version/s: 0.15.0

Issue resolved by pull request #2625
[https://github.com/apache/incubator-gobblin/pull/2625]

> Fix runtime property like Topic.name not available in Compaction when 
> fetching configStore object
> -
>
> Key: GOBBLIN-761
> URL: https://issues.apache.org/jira/browse/GOBBLIN-761
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Lei Sun
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (GOBBLIN-764) Allow passing of rest.li parameters to throttling client

2019-05-06 Thread Hung Tran (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-764.
---
   Resolution: Fixed
Fix Version/s: 0.15.0

Issue resolved by pull request #2628
[https://github.com/apache/incubator-gobblin/pull/2628]

> Allow passing of rest.li parameters to throttling client
> 
>
> Key: GOBBLIN-764
> URL: https://issues.apache.org/jira/browse/GOBBLIN-764
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Issac Buenrostro
>Assignee: Issac Buenrostro
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GOBBLIN-743) Initialize Gobblin application master services with dynamic config

2019-04-20 Thread Hung Tran (JIRA)
Hung Tran created GOBBLIN-743:
-

 Summary: Initialize Gobblin application master services with 
dynamic config
 Key: GOBBLIN-743
 URL: https://issues.apache.org/jira/browse/GOBBLIN-743
 Project: Apache Gobblin
  Issue Type: Task
Reporter: Hung Tran
Assignee: Hung Tran


The Gobblin application manager needs to initialize services with the config 
generated by the dynamic config generator. One use case that requires this is 
the passing of SSL configuration to kafka consumers and producers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (GOBBLIN-726) Enable Schema Verification During Primary Dataset Deployment

2019-04-19 Thread Hung Tran (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-726.
---
   Resolution: Fixed
Fix Version/s: 0.15.0

Issue resolved by pull request #2593
[https://github.com/apache/incubator-gobblin/pull/2593]

> Enable Schema Verification During Primary Dataset Deployment
> 
>
> Key: GOBBLIN-726
> URL: https://issues.apache.org/jira/browse/GOBBLIN-726
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Zihan Li
>Priority: Major
> Fix For: 0.15.0
>
>
> Each distcp mapper will first read the schema of the file to be copied, and 
> abort if the file schema does not match the expected schema. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (GOBBLIN-762) Add automatic scaling for Gobblin on YARN

2019-05-02 Thread Hung Tran (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran updated GOBBLIN-762:
--
Description: 
Gobblin on YARN needs a way to scale up and down the containers based on the 
workload.

Added `YarnAutoScalingManager` which can be started by the 
`GobblinApplicationMaster` by setting the 
`gobblin.yarn.app.master.serviceClasses` configuration. This class runs a 
scheduled task with a default interval of 60 seconds to detect the number of 
required partitions for the workflows submitted to Helix. It will request the 
`YarnService` to scale to a computed number of containers. If the requested 
number of containers is higher than the YarnService has previously requested 
then it will request more containers. If the requested count is less than the 
current number of allocated containers then it will free any unused containers.

  was:Gobblin on YARN needs a way to scale up and down the containers based on 
the workload.


> Add automatic scaling for Gobblin on YARN
> -
>
> Key: GOBBLIN-762
> URL: https://issues.apache.org/jira/browse/GOBBLIN-762
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Hung Tran
>Priority: Major
>
> Gobblin on YARN needs a way to scale up and down the containers based on the 
> workload.
> Added `YarnAutoScalingManager` which can be started by the 
> `GobblinApplicationMaster` by setting the 
> `gobblin.yarn.app.master.serviceClasses` configuration. This class runs a 
> scheduled task with a default interval of 60 seconds to detect the number of 
> required partitions for the workflows submitted to Helix. It will request the 
> `YarnService` to scale to a computed number of containers. If the requested 
> number of containers is higher than the YarnService has previously requested 
> then it will request more containers. If the requested count is less than the 
> current number of allocated containers then it will free any unused 
> containers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GOBBLIN-762) Add automatic scaling for Gobblin on YARN

2019-05-02 Thread Hung Tran (JIRA)
Hung Tran created GOBBLIN-762:
-

 Summary: Add automatic scaling for Gobblin on YARN
 Key: GOBBLIN-762
 URL: https://issues.apache.org/jira/browse/GOBBLIN-762
 Project: Apache Gobblin
  Issue Type: Task
Reporter: Hung Tran


Gobblin on YARN needs a way to scale up and down the containers based on the 
workload.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (GOBBLIN-821) Create Code Coverage Report for Gobblin

2019-07-14 Thread Hung Tran (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-821.
---
   Resolution: Fixed
Fix Version/s: 0.15.0

Issue resolved by pull request #2684
[https://github.com/apache/incubator-gobblin/pull/2684]

> Create Code Coverage Report for Gobblin
> ---
>
> Key: GOBBLIN-821
> URL: https://issues.apache.org/jira/browse/GOBBLIN-821
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Lei Sun
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Resolved] (GOBBLIN-738) Open a way to customize decoding KafkaConsumerRecord

2019-04-20 Thread Hung Tran (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-738.
---
   Resolution: Fixed
Fix Version/s: 0.15.0

Issue resolved by pull request #2605
[https://github.com/apache/incubator-gobblin/pull/2605]

> Open a way to customize decoding KafkaConsumerRecord
> 
>
> Key: GOBBLIN-738
> URL: https://issues.apache.org/jira/browse/GOBBLIN-738
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Zhixiong Chen
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently, decoding a `KafkaConsumerRecord` is limited to 2 forms:
>   - decode as a `ByteArrayBasedKafkaRecord` message
>   - convert value from a `DecodeableKafkaRecord` message
> The task is to open a way for arbitrary decoding mechanism



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (GOBBLIN-725) add a mysql based job-status store

2019-04-10 Thread Hung Tran (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-725.
---
   Resolution: Fixed
Fix Version/s: 0.15.0

Issue resolved by pull request #2592
[https://github.com/apache/incubator-gobblin/pull/2592]

> add a mysql based job-status store
> --
>
> Key: GOBBLIN-725
> URL: https://issues.apache.org/jira/browse/GOBBLIN-725
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Arjun Singh Bora
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (GOBBLIN-719) gobblin-docs has invalid git links

2019-04-10 Thread Hung Tran (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-719.
---
   Resolution: Fixed
Fix Version/s: 0.15.0

Issue resolved by pull request #2586
[https://github.com/apache/incubator-gobblin/pull/2586]

> gobblin-docs has invalid git links
> --
>
> Key: GOBBLIN-719
> URL: https://issues.apache.org/jira/browse/GOBBLIN-719
> Project: Apache Gobblin
>  Issue Type: Bug
>Reporter: Jay Sen
>Priority: Trivial
> Fix For: 0.15.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> gobblin docs had some invalid links pointing not only LinkedIn repo but also 
> old location of the classes that has changes since then.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GOBBLIN-739) Add a way to propagate the Azkaban config to Gobblin on YARN

2019-04-16 Thread Hung Tran (JIRA)
Hung Tran created GOBBLIN-739:
-

 Summary: Add a way to propagate the Azkaban config to Gobblin on 
YARN
 Key: GOBBLIN-739
 URL: https://issues.apache.org/jira/browse/GOBBLIN-739
 Project: Apache Gobblin
  Issue Type: Task
Reporter: Hung Tran
Assignee: Hung Tran


The AzkabanGobblinYarnAppLauncher can be used to launch a Gobblin application 
master on YARN, which then loads configuration from an application.conf file. 
Currently, the application.conf is pre-generated and packaged with the Azkaban 
job zip. This results in duplication of config between the Azkaban job 
properties and the application.conf file. It also doesn't allow user overrides 
in the Azkaban UI to be propagated to the app master and containers.

A config should be added to specify an output path to write the Azkaban job 
config to in HOCON format. The gobblin yarn config such as 
gobblin.yarn.app.master.files.local and gobblin.yarn.container.files.local can 
be set to point to the output file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (GOBBLIN-739) Add a way to propagate the Azkaban job config to Gobblin on YARN

2019-04-16 Thread Hung Tran (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran updated GOBBLIN-739:
--
Summary: Add a way to propagate the Azkaban job config to Gobblin on YARN  
(was: Add a way to propagate the Azkaban config to Gobblin on YARN)

> Add a way to propagate the Azkaban job config to Gobblin on YARN
> 
>
> Key: GOBBLIN-739
> URL: https://issues.apache.org/jira/browse/GOBBLIN-739
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Hung Tran
>Assignee: Hung Tran
>Priority: Major
>
> The AzkabanGobblinYarnAppLauncher can be used to launch a Gobblin application 
> master on YARN, which then loads configuration from an application.conf file. 
> Currently, the application.conf is pre-generated and packaged with the 
> Azkaban job zip. This results in duplication of config between the Azkaban 
> job properties and the application.conf file. It also doesn't allow user 
> overrides in the Azkaban UI to be propagated to the app master and containers.
> A config should be added to specify an output path to write the Azkaban job 
> config to in HOCON format. The gobblin yarn config such as 
> gobblin.yarn.app.master.files.local and gobblin.yarn.container.files.local 
> can be set to point to the output file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (GOBBLIN-747) Set expected schema when creating workunits

2019-04-23 Thread Hung Tran (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-747.
---
   Resolution: Fixed
Fix Version/s: 0.15.0

Issue resolved by pull request #2612
[https://github.com/apache/incubator-gobblin/pull/2612]

> Set expected schema when creating workunits
> ---
>
> Key: GOBBLIN-747
> URL: https://issues.apache.org/jira/browse/GOBBLIN-747
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Zihan Li
>Priority: Major
> Fix For: 0.15.0
>
>
> Set the property of gobblin.copy.expectedSchema when creating the workunit to 
> enable schema check in distcp.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (GOBBLIN-851) Provide capability to disable hive schema registration in partition level

2019-08-16 Thread Hung Tran (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-851.
---
   Resolution: Fixed
Fix Version/s: 0.15.0

Issue resolved by pull request #2707
[https://github.com/apache/incubator-gobblin/pull/2707]

> Provide capability to disable hive schema registration in partition level
> -
>
> Key: GOBBLIN-851
> URL: https://issues.apache.org/jira/browse/GOBBLIN-851
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Kuai Yu
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> We had problems when table level schema and partition level schema diverges. 
> Think about the case when user register two partitions : 2019/08/10, 
> 2019/08/11, but schema changes in between(S1->S2). Now the table level has 
> schema S2, but 2019/08/10 will have schema S1. 
> Query on the latest schema will cause the old partition failure.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Resolved] (GOBBLIN-857) Extending getTopicsFromConfigStore to accept topicName directly

2019-08-16 Thread Hung Tran (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-857.
---
   Resolution: Fixed
Fix Version/s: 0.15.0

Issue resolved by pull request #2713
[https://github.com/apache/incubator-gobblin/pull/2713]

> Extending getTopicsFromConfigStore to accept topicName directly
> ---
>
> Key: GOBBLIN-857
> URL: https://issues.apache.org/jira/browse/GOBBLIN-857
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Lei Sun
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Resolved] (GOBBLIN-862) Security token encryption support in SFDC connector

2019-08-21 Thread Hung Tran (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-862.
---
Fix Version/s: 0.15.0
   Resolution: Fixed

Issue resolved by pull request #2718
[https://github.com/apache/incubator-gobblin/pull/2718]

> Security token encryption support in SFDC connector
> ---
>
> Key: GOBBLIN-862
> URL: https://issues.apache.org/jira/browse/GOBBLIN-862
> Project: Apache Gobblin
>  Issue Type: Task
>  Components: gobblin-salesforce
>Reporter: Monish Vachhani
>Assignee: Hung Tran
>Priority: Major
> Fix For: 0.15.0
>
>
> Security token encryption support in SFDC connector so as not to have 
> security token as plain text.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


<    1   2   3   4   5   >