[jira] [Resolved] (GOBBLIN-576) Send partition level lineage in hive distcp
[ https://issues.apache.org/jira/browse/GOBBLIN-576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hung Tran resolved GOBBLIN-576. --- Resolution: Fixed Fix Version/s: 0.14.0 Issue resolved by pull request #2442 [https://github.com/apache/incubator-gobblin/pull/2442] > Send partition level lineage in hive distcp > --- > > Key: GOBBLIN-576 > URL: https://issues.apache.org/jira/browse/GOBBLIN-576 > Project: Apache Gobblin > Issue Type: Task >Reporter: Zhixiong Chen >Assignee: Zhixiong Chen >Priority: Major > Fix For: 0.14.0 > > > Currently hive distcp only supports dataset/table level lineage. The task is > to send lineage at the table partition level if any. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (GOBBLIN-567) Create config store that downloads and reads from a local jar
[ https://issues.apache.org/jira/browse/GOBBLIN-567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hung Tran resolved GOBBLIN-567. --- Resolution: Fixed Fix Version/s: 0.14.0 Issue resolved by pull request #2430 [https://github.com/apache/incubator-gobblin/pull/2430] > Create config store that downloads and reads from a local jar > - > > Key: GOBBLIN-567 > URL: https://issues.apache.org/jira/browse/GOBBLIN-567 > Project: Apache Gobblin > Issue Type: Bug >Reporter: Jack Moseley >Assignee: Jack Moseley >Priority: Major > Fix For: 0.14.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (GOBBLIN-589) Add more Gobblin tracking metrics in KafkaExtractorTopicMetadata event
[ https://issues.apache.org/jira/browse/GOBBLIN-589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16618280#comment-16618280 ] Hung Tran commented on GOBBLIN-589: --- Additional changes in pull request #2457 https://github.com/apache/incubator-gobblin/pull/2457 > Add more Gobblin tracking metrics in KafkaExtractorTopicMetadata event > -- > > Key: GOBBLIN-589 > URL: https://issues.apache.org/jira/browse/GOBBLIN-589 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Carl Shen >Priority: Major > Fix For: 0.14.0 > > > Add start/stop fetch epoch times so that true extractor ingestion values can > be used instead of estimations. > Add previous\{start/stop fetch epoch times, low/high watermarks}: saves extra > queries required to do lag operations for monitoring purposes. > Add partitionTotalSize and undecodableMessageCount -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (GOBBLIN-589) Add more Gobblin tracking metrics in KafkaExtractorTopicMetadata event
[ https://issues.apache.org/jira/browse/GOBBLIN-589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hung Tran resolved GOBBLIN-589. --- Resolution: Fixed Fix Version/s: 0.14.0 Issue resolved by pull request #2455 [https://github.com/apache/incubator-gobblin/pull/2455] > Add more Gobblin tracking metrics in KafkaExtractorTopicMetadata event > -- > > Key: GOBBLIN-589 > URL: https://issues.apache.org/jira/browse/GOBBLIN-589 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Carl Shen >Priority: Major > Fix For: 0.14.0 > > > Add start/stop fetch epoch times and partition total size: so that true > extractor ingestion values can be used instead of estimations. > Add previous\{start/stop fetch epoch times, low/high watermarks}: saves extra > queries required to do lag operations for monitoring purposes. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (GOBBLIN-586) Feature to apply retention in remote HDFS
[ https://issues.apache.org/jira/browse/GOBBLIN-586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hung Tran resolved GOBBLIN-586. --- Resolution: Fixed Fix Version/s: 0.14.0 Issue resolved by pull request #2452 [https://github.com/apache/incubator-gobblin/pull/2452] > Feature to apply retention in remote HDFS > - > > Key: GOBBLIN-586 > URL: https://issues.apache.org/jira/browse/GOBBLIN-586 > Project: Apache Gobblin > Issue Type: New Feature >Reporter: Karthik Amarnath >Priority: Minor > Fix For: 0.14.0 > > > Enhancement feature to apply retention in remote HDFS by reading the > configuration from local HDFS. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (GOBBLIN-591) Allow user to pass in the customized http client for azkaban client
[ https://issues.apache.org/jira/browse/GOBBLIN-591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hung Tran resolved GOBBLIN-591. --- Resolution: Fixed Fix Version/s: 0.14.0 Issue resolved by pull request #2458 [https://github.com/apache/incubator-gobblin/pull/2458] > Allow user to pass in the customized http client for azkaban client > --- > > Key: GOBBLIN-591 > URL: https://issues.apache.org/jira/browse/GOBBLIN-591 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Kuai Yu >Assignee: Kuai Yu >Priority: Major > Fix For: 0.14.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GOBBLIN-671) Close the underlying writer when a HiveWritableHdfsDataWriter is closed
Hung Tran created GOBBLIN-671: - Summary: Close the underlying writer when a HiveWritableHdfsDataWriter is closed Key: GOBBLIN-671 URL: https://issues.apache.org/jira/browse/GOBBLIN-671 Project: Apache Gobblin Issue Type: Task Reporter: Hung Tran Assignee: Hung Tran The HiveWritableHdfsDataWriter writer does not close the underlying writer when close() is called. This results in holding onto writer resources after the close. For some underlying writers like an OrcRecordWriter this case result a large amount of memory being buffered which leads to OOMs. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (GOBBLIN-721) Gobblin streaming recipe is broken
[ https://issues.apache.org/jira/browse/GOBBLIN-721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hung Tran resolved GOBBLIN-721. --- Resolution: Fixed Fix Version/s: 0.15.0 Issue resolved by pull request #2588 [https://github.com/apache/incubator-gobblin/pull/2588] > Gobblin streaming recipe is broken > -- > > Key: GOBBLIN-721 > URL: https://issues.apache.org/jira/browse/GOBBLIN-721 > Project: Apache Gobblin > Issue Type: Bug > Components: gobblin-core >Reporter: Shirshanka Das >Assignee: Shirshanka Das >Priority: Major > Fix For: 0.15.0 > > Time Spent: 20m > Remaining Estimate: 0h > > Running the simple streaming pull file results in a "negative acks" problem > because the internal pipeline has been re-architected to ack automatically. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (GOBBLIN-703) Allow planning job to be run in a non-blocking way
[ https://issues.apache.org/jira/browse/GOBBLIN-703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hung Tran resolved GOBBLIN-703. --- Resolution: Fixed Fix Version/s: 0.15.0 Issue resolved by pull request #2573 [https://github.com/apache/incubator-gobblin/pull/2573] > Allow planning job to be run in a non-blocking way > -- > > Key: GOBBLIN-703 > URL: https://issues.apache.org/jira/browse/GOBBLIN-703 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Kuai Yu >Priority: Major > Fix For: 0.15.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > Today all the planning job will be running in a dedicated thread pool and > will wait until the full execution to be completed. This requires a lot of > system resources and a dedicated monitoring thread. The improvement here is > to reduce the waiting time on a dedicated monitoring thread. Basically once > the planning job submits to the Helix, we don't need to wait on the job > completion. The job status monitoring will be achieved by GaaS monitoring. > By doing this, we are freeing most of the threadpool resources because each > monitoring thread will be immediately return after it finishes the job > submission. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (GOBBLIN-706) Making KafkaSource dynamically determine the number of mapper
[ https://issues.apache.org/jira/browse/GOBBLIN-706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hung Tran resolved GOBBLIN-706. --- Resolution: Fixed Fix Version/s: 0.15.0 Issue resolved by pull request #2576 [https://github.com/apache/incubator-gobblin/pull/2576] > Making KafkaSource dynamically determine the number of mapper > - > > Key: GOBBLIN-706 > URL: https://issues.apache.org/jira/browse/GOBBLIN-706 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Lei Sun >Priority: Major > Fix For: 0.15.0 > > Time Spent: 1h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (GOBBLIN-717) Filter Out Empty MultiWorkUnits
[ https://issues.apache.org/jira/browse/GOBBLIN-717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hung Tran resolved GOBBLIN-717. --- Resolution: Fixed Fix Version/s: 0.15.0 Issue resolved by pull request #2584 [https://github.com/apache/incubator-gobblin/pull/2584] > Filter Out Empty MultiWorkUnits > --- > > Key: GOBBLIN-717 > URL: https://issues.apache.org/jira/browse/GOBBLIN-717 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Zihan Li >Priority: Major > Fix For: 0.15.0 > > > Now when we run a job, Gobblin use the value of max mappers or the target > size of a mapper to determine the number of mappers. But since one partition > cannot be divided into several WorkUnits, work cannot be evenly distributed, > there are many mappers(MultiWorkUnits) have no work to do. This will waste a > lot of resources. So we need to filter out MultiWorkUnits which contains no > WorkUnit when we determine the number of mappers. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (GOBBLIN-712) Add version strategy for configbased dataset copy
[ https://issues.apache.org/jira/browse/GOBBLIN-712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hung Tran resolved GOBBLIN-712. --- Resolution: Fixed Fix Version/s: 0.15.0 Issue resolved by pull request #2579 [https://github.com/apache/incubator-gobblin/pull/2579] > Add version strategy for configbased dataset copy > - > > Key: GOBBLIN-712 > URL: https://issues.apache.org/jira/browse/GOBBLIN-712 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Kuai Yu >Priority: Major > Fix For: 0.15.0 > > Time Spent: 2h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (GOBBLIN-723) Add support to the LogCopier for copying from multiple source paths
[ https://issues.apache.org/jira/browse/GOBBLIN-723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hung Tran resolved GOBBLIN-723. --- Resolution: Fixed Fix Version/s: 0.15.0 Issue resolved by pull request #2590 [https://github.com/apache/incubator-gobblin/pull/2590] > Add support to the LogCopier for copying from multiple source paths > --- > > Key: GOBBLIN-723 > URL: https://issues.apache.org/jira/browse/GOBBLIN-723 > Project: Apache Gobblin > Issue Type: Task >Reporter: Hung Tran >Assignee: Hung Tran >Priority: Major > Fix For: 0.15.0 > > Time Spent: 20m > Remaining Estimate: 0h > > The LogCopier should support multiple source paths. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (GOBBLIN-722) add option to unschedule a gaas flow
[ https://issues.apache.org/jira/browse/GOBBLIN-722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hung Tran resolved GOBBLIN-722. --- Resolution: Fixed Fix Version/s: 0.15.0 Issue resolved by pull request #2589 [https://github.com/apache/incubator-gobblin/pull/2589] > add option to unschedule a gaas flow > > > Key: GOBBLIN-722 > URL: https://issues.apache.org/jira/browse/GOBBLIN-722 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Arjun Singh Bora >Priority: Major > Fix For: 0.15.0 > > Time Spent: 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GOBBLIN-723) Add support to the LogCopier for copying from multiple source paths
Hung Tran created GOBBLIN-723: - Summary: Add support to the LogCopier for copying from multiple source paths Key: GOBBLIN-723 URL: https://issues.apache.org/jira/browse/GOBBLIN-723 Project: Apache Gobblin Issue Type: Task Reporter: Hung Tran Assignee: Hung Tran The LogCopier should support multiple source paths. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (GOBBLIN-716) Add lineage in FileBasedSource
[ https://issues.apache.org/jira/browse/GOBBLIN-716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hung Tran resolved GOBBLIN-716. --- Resolution: Fixed Fix Version/s: 0.15.0 Issue resolved by pull request #2583 [https://github.com/apache/incubator-gobblin/pull/2583] > Add lineage in FileBasedSource > -- > > Key: GOBBLIN-716 > URL: https://issues.apache.org/jira/browse/GOBBLIN-716 > Project: Apache Gobblin > Issue Type: Task >Reporter: Zhixiong Chen >Assignee: Zhixiong Chen >Priority: Major > Fix For: 0.15.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > Add lineage in `FileBasedSource` > - By default, `FileBasedSource` marks dataset level source lineage > - A `PartitionedFileSourceBase` marks partition level source lineage > Fix destinations overwritten in `LineageInfo.putDestination(List > descriptors, int branchId, State state)`. Multiple calls should append given > descriptors -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (GOBBLIN-709) Provide an option to disallow concurrent flow executions in Gobblin-as-a-Service
[ https://issues.apache.org/jira/browse/GOBBLIN-709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hung Tran resolved GOBBLIN-709. --- Resolution: Fixed Issue resolved by pull request #2580 [https://github.com/apache/incubator-gobblin/pull/2580] > Provide an option to disallow concurrent flow executions in > Gobblin-as-a-Service > > > Key: GOBBLIN-709 > URL: https://issues.apache.org/jira/browse/GOBBLIN-709 > Project: Apache Gobblin > Issue Type: Improvement > Components: gobblin-service >Affects Versions: 0.15.0 >Reporter: Sudarshan Vasudevan >Assignee: Abhishek Tiwari >Priority: Major > Fix For: 0.15.0 > > Time Spent: 1h > Remaining Estimate: 0h > > Provide an option to disallow concurrent flow executions in > Gobblin-as-a-Service. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (GOBBLIN-713) Lazy load job specification from job catalog to avoid OOM issue when JobCatalog is bootup.
[ https://issues.apache.org/jira/browse/GOBBLIN-713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hung Tran resolved GOBBLIN-713. --- Resolution: Fixed Fix Version/s: 0.15.0 Issue resolved by pull request #2581 [https://github.com/apache/incubator-gobblin/pull/2581] > Lazy load job specification from job catalog to avoid OOM issue when > JobCatalog is bootup. > -- > > Key: GOBBLIN-713 > URL: https://issues.apache.org/jira/browse/GOBBLIN-713 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Kuai Yu >Priority: Major > Fix For: 0.15.0 > > Time Spent: 50m > Remaining Estimate: 0h > > Today whenever the job catalog is restarted, all the job specs are load into > memory. This can cause OOM issue in our production load. Ticket was created > to provide an easy way to load job spec without materializing all the job > specs into memory. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (GOBBLIN-690) Relaunch check for the planning job is not correct
[ https://issues.apache.org/jira/browse/GOBBLIN-690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hung Tran resolved GOBBLIN-690. --- Resolution: Fixed Fix Version/s: 0.15.0 Issue resolved by pull request #2562 [https://github.com/apache/incubator-gobblin/pull/2562] > Relaunch check for the planning job is not correct > -- > > Key: GOBBLIN-690 > URL: https://issues.apache.org/jira/browse/GOBBLIN-690 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Kuai Yu >Priority: Major > Fix For: 0.15.0 > > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (GOBBLIN-692) Add support to query last K flow executions in Gobblin-as-a-Service (GaaS)
[ https://issues.apache.org/jira/browse/GOBBLIN-692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hung Tran resolved GOBBLIN-692. --- Resolution: Fixed Issue resolved by pull request #2564 [https://github.com/apache/incubator-gobblin/pull/2564] > Add support to query last K flow executions in Gobblin-as-a-Service (GaaS) > -- > > Key: GOBBLIN-692 > URL: https://issues.apache.org/jira/browse/GOBBLIN-692 > Project: Apache Gobblin > Issue Type: Improvement > Components: gobblin-service >Affects Versions: 0.15.0 >Reporter: Sudarshan Vasudevan >Assignee: Abhishek Tiwari >Priority: Major > Fix For: 0.15.0 > > Time Spent: 20m > Remaining Estimate: 0h > > Currently, REST APIs only support retrieving the latest execution of a flow. > We enhance the APIs to query the last K executions where K is passed as a > query parameter. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GOBBLIN-693) Add ORC hive serde manager
Hung Tran created GOBBLIN-693: - Summary: Add ORC hive serde manager Key: GOBBLIN-693 URL: https://issues.apache.org/jira/browse/GOBBLIN-693 Project: Apache Gobblin Issue Type: Task Reporter: Hung Tran Assignee: Hung Tran Add an ORC hive serde manager to register ORC datasets in Hive. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (GOBBLIN-677) Allow for early termination of Gobblin jobs based on a predicate on job progress
[ https://issues.apache.org/jira/browse/GOBBLIN-677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hung Tran resolved GOBBLIN-677. --- Resolution: Fixed Fix Version/s: 0.15.0 Issue resolved by pull request #2548 [https://github.com/apache/incubator-gobblin/pull/2548] > Allow for early termination of Gobblin jobs based on a predicate on job > progress > > > Key: GOBBLIN-677 > URL: https://issues.apache.org/jira/browse/GOBBLIN-677 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Issac Buenrostro >Assignee: Issac Buenrostro >Priority: Major > Fix For: 0.15.0 > > Time Spent: 2h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (GOBBLIN-684) Ensure buffered messages are flushed before close() in KafkaProducerPusher
[ https://issues.apache.org/jira/browse/GOBBLIN-684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hung Tran resolved GOBBLIN-684. --- Resolution: Fixed Issue resolved by pull request #2556 [https://github.com/apache/incubator-gobblin/pull/2556] > Ensure buffered messages are flushed before close() in KafkaProducerPusher > -- > > Key: GOBBLIN-684 > URL: https://issues.apache.org/jira/browse/GOBBLIN-684 > Project: Apache Gobblin > Issue Type: Improvement > Components: gobblin-metrics >Affects Versions: 0.15.0 >Reporter: Sudarshan Vasudevan >Assignee: Sudarshan Vasudevan >Priority: Major > Fix For: 0.15.0 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > Currently, when KafkaProducerPusher is closed, it invokes > KafkaProducer#close(). However,close() only guarantees delivery of in-flight > messages, not the messages in the producer buffer waiting to be sent out. > This results in data loss. > The fix ensures that we call flush() before close(). As a result, any > buffered messages are immediately pushed out and we block until the messages > are acked. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (GOBBLIN-669) Configuration Properties Glossary section of Docs hard to read
[ https://issues.apache.org/jira/browse/GOBBLIN-669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hung Tran resolved GOBBLIN-669. --- Resolution: Fixed Fix Version/s: 0.15.0 Issue resolved by pull request #2538 [https://github.com/apache/incubator-gobblin/pull/2538] > Configuration Properties Glossary section of Docs hard to read > -- > > Key: GOBBLIN-669 > URL: https://issues.apache.org/jira/browse/GOBBLIN-669 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Christian Soseman >Priority: Trivial > Labels: documentation > Fix For: 0.15.0 > > Original Estimate: 48h > Time Spent: 10m > Remaining Estimate: 47h 50m > > The following section of the documentation is really hard to comb through: > [https://gobblin.readthedocs.io/en/latest/user-guide/Configuration-Properties-Glossary/] > > I believe that tables would work much easier and make it easier to find > properties on this page. The current setup makes it hard to tell where one > property starts and the next one stops. > > *I'm currently working on a resolution and will submit the PR for approval. > No other resources are required for this particular task outside of review.* -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (GOBBLIN-666) Data too long for column 'property_key'
[ https://issues.apache.org/jira/browse/GOBBLIN-666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hung Tran resolved GOBBLIN-666. --- Resolution: Fixed Fix Version/s: 0.15.0 Issue resolved by pull request #2539 [https://github.com/apache/incubator-gobblin/pull/2539] > Data too long for column 'property_key' > --- > > Key: GOBBLIN-666 > URL: https://issues.apache.org/jira/browse/GOBBLIN-666 > Project: Apache Gobblin > Issue Type: Bug > Components: state-management >Affects Versions: 0.14.0 >Reporter: Francis Laforge >Assignee: Hung Tran >Priority: Major > Fix For: 0.15.0 > > Time Spent: 10m > Remaining Estimate: 0h > > We may have following error : > {noformat} > com.mysql.jdbc.MysqlDataTruncation: Data truncation: Data too long for column > 'property_key' at row 1 > at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3876) > at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3814) > at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2478) > at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2625) > at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2551) > at > com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:1861) > at > com.mysql.jdbc.PreparedStatement.executeUpdateInternal(PreparedStatement.java:2073) > at > com.mysql.jdbc.PreparedStatement.executeUpdateInternal(PreparedStatement.java:2009) > at > com.mysql.jdbc.PreparedStatement.executeLargeUpdate(PreparedStatement.java:5094) > at > com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:1994) > at > org.apache.commons.dbcp.DelegatingPreparedStatement.executeUpdate(DelegatingPreparedStatement.java:105) > at > org.apache.commons.dbcp.DelegatingPreparedStatement.executeUpdate(DelegatingPreparedStatement.java:105) > at > org.apache.gobblin.metastore.database.DatabaseJobHistoryStoreV100.updateProperty(DatabaseJobHistoryStoreV100.java:523) > at > org.apache.gobblin.metastore.database.DatabaseJobHistoryStoreV100.put(DatabaseJobHistoryStoreV100.java:244) > at > org.apache.gobblin.metastore.DatabaseJobHistoryStore.put(DatabaseJobHistoryStore.java:77) > at > org.apache.gobblin.runtime.JobContext.storeJobExecutionInfo(JobContext.java:406) > at > org.apache.gobblin.runtime.AbstractJobLauncher.launchJob(AbstractJobLauncher.java:490) > at > org.apache.gobblin.scheduler.JobScheduler.runJob(JobScheduler.java:479) > at > org.apache.gobblin.scheduler.JobScheduler.runJob(JobScheduler.java:435) > at > org.apache.gobblin.scheduler.JobScheduler$GobblinJob.executeImpl(JobScheduler.java:598) > at > org.apache.gobblin.scheduler.BaseGobblinJob.execute(BaseGobblinJob.java:58) > at org.quartz.core.JobRunShell.run(JobRunShell.java:202) > at > org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:573){noformat} > when property_key is too long. Unfortunately, some keys are automatically > generated and can be very long. For example when using parquet with > partitionning we may have this type of key : > {noformat} > construct.final.state.FORK_OPERATOR.0.WRITER.RecordsWritten_partition1=val_name_1/partition2=2017-12-08/partition3=8/partition4=22812 > {noformat} > which is more than 128 char given in the mysql table definition. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (GOBBLIN-682) Create a new constructor for DatasetCleanerJob
[ https://issues.apache.org/jira/browse/GOBBLIN-682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hung Tran resolved GOBBLIN-682. --- Resolution: Fixed Issue resolved by pull request #2554 [https://github.com/apache/incubator-gobblin/pull/2554] > Create a new constructor for DatasetCleanerJob > -- > > Key: GOBBLIN-682 > URL: https://issues.apache.org/jira/browse/GOBBLIN-682 > Project: Apache Gobblin > Issue Type: Improvement > Components: gobblin-azkaban >Affects Versions: 0.15.0 >Reporter: Sudarshan Vasudevan >Assignee: Sudarshan Vasudevan >Priority: Major > Fix For: 0.15.0 > > Time Spent: 20m > Remaining Estimate: 0h > > Current DatasetCleanerJob constructor only accepts config passed as > azkaban.utils.Props. Here, we implement a new construct that also accepts > java.util.Properties. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (GOBBLIN-698) Enhance logging to print job and flow details when a job is orchestrated by GaaS
[ https://issues.apache.org/jira/browse/GOBBLIN-698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hung Tran resolved GOBBLIN-698. --- Resolution: Fixed Issue resolved by pull request #2569 [https://github.com/apache/incubator-gobblin/pull/2569] > Enhance logging to print job and flow details when a job is orchestrated by > GaaS > > > Key: GOBBLIN-698 > URL: https://issues.apache.org/jira/browse/GOBBLIN-698 > Project: Apache Gobblin > Issue Type: Improvement > Components: gobblin-service >Affects Versions: 0.15.0 >Reporter: Sudarshan Vasudevan >Assignee: Abhishek Tiwari >Priority: Major > Fix For: 0.15.0 > > Time Spent: 20m > Remaining Estimate: 0h > > We enhance logging in GaaS to add job and flow details when a job is > orchestrated by GaaS. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (GOBBLIN-691) Make compaction implementation format-insensitive
[ https://issues.apache.org/jira/browse/GOBBLIN-691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hung Tran resolved GOBBLIN-691. --- Resolution: Fixed Fix Version/s: 0.15.0 Issue resolved by pull request #2563 [https://github.com/apache/incubator-gobblin/pull/2563] > Make compaction implementation format-insensitive > - > > Key: GOBBLIN-691 > URL: https://issues.apache.org/jira/browse/GOBBLIN-691 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Lei Sun >Priority: Major > Fix For: 0.15.0 > > Time Spent: 1h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (GOBBLIN-688) Make FsJobStatusRetriever config more scoped
[ https://issues.apache.org/jira/browse/GOBBLIN-688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hung Tran resolved GOBBLIN-688. --- Resolution: Fixed Issue resolved by pull request #2560 [https://github.com/apache/incubator-gobblin/pull/2560] > Make FsJobStatusRetriever config more scoped > > > Key: GOBBLIN-688 > URL: https://issues.apache.org/jira/browse/GOBBLIN-688 > Project: Apache Gobblin > Issue Type: Improvement > Components: gobblin-service >Affects Versions: 0.15.0 >Reporter: Sudarshan Vasudevan >Assignee: Abhishek Tiwari >Priority: Major > Fix For: 0.15.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > The proposed enhancement adds a configuration prefix to FsJobStatusRetriever > config to make it more scoped. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (GOBBLIN-683) Azkaban client should retry if session gets expired
[ https://issues.apache.org/jira/browse/GOBBLIN-683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hung Tran resolved GOBBLIN-683. --- Resolution: Fixed Fix Version/s: 0.15.0 Issue resolved by pull request #2555 [https://github.com/apache/incubator-gobblin/pull/2555] > Azkaban client should retry if session gets expired > --- > > Key: GOBBLIN-683 > URL: https://issues.apache.org/jira/browse/GOBBLIN-683 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Kuai Yu >Assignee: Kuai Yu >Priority: Major > Fix For: 0.15.0 > > Time Spent: 2h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (GOBBLIN-687) Pass TopologySpec map to DagManager to allow reuse of SpecExecutors during DAG deserialization
[ https://issues.apache.org/jira/browse/GOBBLIN-687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hung Tran resolved GOBBLIN-687. --- Resolution: Fixed Issue resolved by pull request #2559 [https://github.com/apache/incubator-gobblin/pull/2559] > Pass TopologySpec map to DagManager to allow reuse of SpecExecutors during > DAG deserialization > -- > > Key: GOBBLIN-687 > URL: https://issues.apache.org/jira/browse/GOBBLIN-687 > Project: Apache Gobblin > Issue Type: Improvement > Components: gobblin-service >Affects Versions: 0.15.0 >Reporter: Sudarshan Vasudevan >Assignee: Abhishek Tiwari >Priority: Major > Fix For: 0.15.0 > > Time Spent: 1h 50m > Remaining Estimate: 0h > > DagManager maintains state of all currently executing DAGs, by serializing > each DAG on compilation and persisting it to a durable store. The serialized > DAG includes Job config as well as the SpecExecutor config for each job in > the DAG. This is done to correctly resume execution of DAGs in case of > service restarts or leadership change. > Currently, on service restart/leadership change, the new master de-serializes > SpecExecutor config and creates a SpecExecutor instance for each job in the > DAG. If the number of DAGs is large, this can result in many connections to > the underlying executor instance. The proposed fix allows the DagManager to > re-use the SpecExecutor instances created by the TopologySpecFactory when it > deserializes a DAG. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (GOBBLIN-686) handle schema mismatch in compatible schemas
[ https://issues.apache.org/jira/browse/GOBBLIN-686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hung Tran resolved GOBBLIN-686. --- Resolution: Fixed Fix Version/s: 0.15.0 Issue resolved by pull request #2558 [https://github.com/apache/incubator-gobblin/pull/2558] > handle schema mismatch in compatible schemas > > > Key: GOBBLIN-686 > URL: https://issues.apache.org/jira/browse/GOBBLIN-686 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Arjun Singh Bora >Priority: Major > Fix For: 0.15.0 > > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (GOBBLIN-696) Provide an "explain" option to return a compiled flow when a flow config is added.
[ https://issues.apache.org/jira/browse/GOBBLIN-696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hung Tran resolved GOBBLIN-696. --- Resolution: Fixed Issue resolved by pull request #2567 [https://github.com/apache/incubator-gobblin/pull/2567] > Provide an "explain" option to return a compiled flow when a flow config is > added. > -- > > Key: GOBBLIN-696 > URL: https://issues.apache.org/jira/browse/GOBBLIN-696 > Project: Apache Gobblin > Issue Type: Improvement > Components: gobblin-service >Affects Versions: 0.15.0 >Reporter: Sudarshan Vasudevan >Assignee: Abhishek Tiwari >Priority: Major > Fix For: 0.15.0 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > We add support for an "explain" option in Gobblin-as-a-Service (GaaS) flow > creation requests to return the expected output of flow compilation. The > "explain" option allows end users to validate their FlowConfig requests by > ensuring that: 1. the request results in a successful compilation and 2. that > the compiled output is as expected. Further, the "explain" option allows > users to query GaaS without any side-effects i.e. no FlowSpecs are actually > created/scheduled on GaaS. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (GOBBLIN-704) Adding serde props for ORCSerDe initialization
[ https://issues.apache.org/jira/browse/GOBBLIN-704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hung Tran resolved GOBBLIN-704. --- Resolution: Fixed Fix Version/s: 0.15.0 Issue resolved by pull request #2574 [https://github.com/apache/incubator-gobblin/pull/2574] > Adding serde props for ORCSerDe initialization > -- > > Key: GOBBLIN-704 > URL: https://issues.apache.org/jira/browse/GOBBLIN-704 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Lei Sun >Priority: Major > Fix For: 0.15.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (GOBBLIN-689) catch unchecked exceptions in KafkaSource
[ https://issues.apache.org/jira/browse/GOBBLIN-689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hung Tran resolved GOBBLIN-689. --- Resolution: Fixed Fix Version/s: 0.15.0 Issue resolved by pull request #2561 [https://github.com/apache/incubator-gobblin/pull/2561] > catch unchecked exceptions in KafkaSource > - > > Key: GOBBLIN-689 > URL: https://issues.apache.org/jira/browse/GOBBLIN-689 > Project: Apache Gobblin > Issue Type: Bug >Reporter: Arjun Singh Bora >Priority: Major > Fix For: 0.15.0 > > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (GOBBLIN-702) Fix bug by reusable OrcStruct
[ https://issues.apache.org/jira/browse/GOBBLIN-702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hung Tran resolved GOBBLIN-702. --- Resolution: Fixed Fix Version/s: 0.15.0 Issue resolved by pull request #2572 [https://github.com/apache/incubator-gobblin/pull/2572] > Fix bug by reusable OrcStruct > -- > > Key: GOBBLIN-702 > URL: https://issues.apache.org/jira/browse/GOBBLIN-702 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Lei Sun >Priority: Major > Fix For: 0.15.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (GOBBLIN-697) Allow distcp to carry over file version independently of modtime
[ https://issues.apache.org/jira/browse/GOBBLIN-697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hung Tran resolved GOBBLIN-697. --- Resolution: Fixed Fix Version/s: 0.15.0 Issue resolved by pull request #2568 [https://github.com/apache/incubator-gobblin/pull/2568] > Allow distcp to carry over file version independently of modtime > > > Key: GOBBLIN-697 > URL: https://issues.apache.org/jira/browse/GOBBLIN-697 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Issac Buenrostro >Assignee: Issac Buenrostro >Priority: Major > Fix For: 0.15.0 > > Time Spent: 2h 10m > Remaining Estimate: 0h > > Examples where this might be useful is data syncing between two locations. > Relying on modification times to detect data changes may lead to a feedback > loop of copying: data gets created at location A at time 0, at time 1 data is > copied to location B, sync mechanism might incorrectly believe that since mod > time of location B is higher, it should be synced back to location A, etc. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (GOBBLIN-695) Add tools for generating binary files in avro/orc using json
[ https://issues.apache.org/jira/browse/GOBBLIN-695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hung Tran resolved GOBBLIN-695. --- Resolution: Fixed Fix Version/s: 0.15.0 Issue resolved by pull request #2566 [https://github.com/apache/incubator-gobblin/pull/2566] > Add tools for generating binary files in avro/orc using json > > > Key: GOBBLIN-695 > URL: https://issues.apache.org/jira/browse/GOBBLIN-695 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Lei Sun >Priority: Major > Fix For: 0.15.0 > > Time Spent: 1.5h > Remaining Estimate: 0h > > port from internal product that uses gobblin. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (GOBBLIN-705) Create method to merge table props from existing hive meta table
[ https://issues.apache.org/jira/browse/GOBBLIN-705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hung Tran resolved GOBBLIN-705. --- Resolution: Fixed Fix Version/s: 0.15.0 Issue resolved by pull request #2575 [https://github.com/apache/incubator-gobblin/pull/2575] > Create method to merge table props from existing hive meta table > > > Key: GOBBLIN-705 > URL: https://issues.apache.org/jira/browse/GOBBLIN-705 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Lei Sun >Priority: Major > Fix For: 0.15.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (GOBBLIN-680) Enhance error handling on task creation
[ https://issues.apache.org/jira/browse/GOBBLIN-680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hung Tran resolved GOBBLIN-680. --- Resolution: Fixed Fix Version/s: 0.15.0 Issue resolved by pull request #2551 [https://github.com/apache/incubator-gobblin/pull/2551] > Enhance error handling on task creation > > > Key: GOBBLIN-680 > URL: https://issues.apache.org/jira/browse/GOBBLIN-680 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Lei Sun >Assignee: Lei Sun >Priority: Major > Fix For: 0.15.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (GOBBLIN-681) increase max allowed size of a job name
[ https://issues.apache.org/jira/browse/GOBBLIN-681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hung Tran resolved GOBBLIN-681. --- Resolution: Fixed Fix Version/s: 0.15.0 Issue resolved by pull request #2552 [https://github.com/apache/incubator-gobblin/pull/2552] > increase max allowed size of a job name > --- > > Key: GOBBLIN-681 > URL: https://issues.apache.org/jira/browse/GOBBLIN-681 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Arjun Singh Bora >Assignee: Arjun Singh Bora >Priority: Major > Fix For: 0.15.0 > > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (GOBBLIN-679) Refactor cluster task metrics
[ https://issues.apache.org/jira/browse/GOBBLIN-679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hung Tran resolved GOBBLIN-679. --- Resolution: Fixed Fix Version/s: 0.15.0 Issue resolved by pull request #2553 [https://github.com/apache/incubator-gobblin/pull/2553] > Refactor cluster task metrics > - > > Key: GOBBLIN-679 > URL: https://issues.apache.org/jira/browse/GOBBLIN-679 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Kuai Yu >Assignee: Kuai Yu >Priority: Major > Fix For: 0.15.0 > > Time Spent: 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (GOBBLIN-685) Add jstack when timeout happens in EmbeddedGobblin
[ https://issues.apache.org/jira/browse/GOBBLIN-685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hung Tran resolved GOBBLIN-685. --- Resolution: Fixed Fix Version/s: 0.15.0 Issue resolved by pull request #2557 [https://github.com/apache/incubator-gobblin/pull/2557] > Add jstack when timeout happens in EmbeddedGobblin > -- > > Key: GOBBLIN-685 > URL: https://issues.apache.org/jira/browse/GOBBLIN-685 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Kuai Yu >Assignee: Kuai Yu >Priority: Major > Fix For: 0.15.0 > > Time Spent: 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (GOBBLIN-676) Add record metadata support to the RecordEnvelope
[ https://issues.apache.org/jira/browse/GOBBLIN-676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hung Tran resolved GOBBLIN-676. --- Resolution: Fixed Fix Version/s: 0.15.0 Issue resolved by pull request #2546 [https://github.com/apache/incubator-gobblin/pull/2546] > Add record metadata support to the RecordEnvelope > - > > Key: GOBBLIN-676 > URL: https://issues.apache.org/jira/browse/GOBBLIN-676 > Project: Apache Gobblin > Issue Type: Task >Reporter: Hung Tran >Assignee: Hung Tran >Priority: Major > Fix For: 0.15.0 > > > The RecordEnvelope currently only has a watermark. Add a Map to it to store > record-level metadata. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (GOBBLIN-671) Close the underlying writer when a HiveWritableHdfsDataWriter is closed
[ https://issues.apache.org/jira/browse/GOBBLIN-671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hung Tran resolved GOBBLIN-671. --- Resolution: Fixed Fix Version/s: 0.15.0 Issue resolved by pull request #2541 [https://github.com/apache/incubator-gobblin/pull/2541] > Close the underlying writer when a HiveWritableHdfsDataWriter is closed > --- > > Key: GOBBLIN-671 > URL: https://issues.apache.org/jira/browse/GOBBLIN-671 > Project: Apache Gobblin > Issue Type: Task >Reporter: Hung Tran >Assignee: Hung Tran >Priority: Major > Fix For: 0.15.0 > > > The HiveWritableHdfsDataWriter writer does not close the underlying writer > when close() is called. This results in holding onto writer resources after > the close. For some underlying writers like an OrcRecordWriter this case > result a large amount of memory being buffered which leads to OOMs. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (GOBBLIN-628) Zuora Source
[ https://issues.apache.org/jira/browse/GOBBLIN-628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hung Tran resolved GOBBLIN-628. --- Resolution: Fixed Fix Version/s: 0.15.0 Issue resolved by pull request #2498 [https://github.com/apache/incubator-gobblin/pull/2498] > Zuora Source > > > Key: GOBBLIN-628 > URL: https://issues.apache.org/jira/browse/GOBBLIN-628 > Project: Apache Gobblin > Issue Type: Task >Reporter: Abhishek Tiwari >Assignee: Abhishek Tiwari >Priority: Major > Fix For: 0.15.0 > > > Zuora Source connector. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (GOBBLIN-673) Implement a FS based JobStatusRetriever for GaaS Flows.
[ https://issues.apache.org/jira/browse/GOBBLIN-673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hung Tran resolved GOBBLIN-673. --- Resolution: Fixed Issue resolved by pull request #2545 [https://github.com/apache/incubator-gobblin/pull/2545] > Implement a FS based JobStatusRetriever for GaaS Flows. > --- > > Key: GOBBLIN-673 > URL: https://issues.apache.org/jira/browse/GOBBLIN-673 > Project: Apache Gobblin > Issue Type: Improvement > Components: gobblin-service >Affects Versions: 0.15.0 >Reporter: Sudarshan Vasudevan >Assignee: Sudarshan Vasudevan >Priority: Major > Fix For: 0.15.0 > > > This PR implements a FileSystem based JobStatusRetriever that makes use of > the StateStore interface. The PR also implements a KafkaJobStatusMonitor that > pulls tracking events from Kafka and writes them to an FSStateStore. The > FSJobStatusRetriever can then be used to query the status of jobs/flows from > the state store. A StateStoreCleaner thread is scheduled by the Job status > monitor to clean up the state store as configured by the retention config. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (GOBBLIN-675) Enhance FSDatasetDescriptor definition to include partition config, encryption level and compaction config.
[ https://issues.apache.org/jira/browse/GOBBLIN-675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hung Tran resolved GOBBLIN-675. --- Resolution: Fixed Issue resolved by pull request #2544 [https://github.com/apache/incubator-gobblin/pull/2544] > Enhance FSDatasetDescriptor definition to include partition config, > encryption level and compaction config. > --- > > Key: GOBBLIN-675 > URL: https://issues.apache.org/jira/browse/GOBBLIN-675 > Project: Apache Gobblin > Issue Type: Improvement > Components: gobblin-service >Affects Versions: 0.15.0 >Reporter: Sudarshan Vasudevan >Assignee: Sudarshan Vasudevan >Priority: Major > Fix For: 0.15.0 > > > Enhance FSDatasetDescriptor definition to include > # partition configuration of dataset (e.g. datetime, regex etc.) > # Add config for encryption level (e.g. file, row, field), and > # Add compaction config (e.g. plain compaction, compaction with de-dup). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GOBBLIN-676) Add record metadata support to the RecordEnvelope
Hung Tran created GOBBLIN-676: - Summary: Add record metadata support to the RecordEnvelope Key: GOBBLIN-676 URL: https://issues.apache.org/jira/browse/GOBBLIN-676 Project: Apache Gobblin Issue Type: Task Reporter: Hung Tran Assignee: Hung Tran The RecordEnvelope currently only has a watermark. Add a Map to it to store record-level metadata. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (GOBBLIN-727) Skip commit in CloseOnFlushWriterWrapper if a commit has already been invoked on the underlying writer.
[ https://issues.apache.org/jira/browse/GOBBLIN-727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hung Tran resolved GOBBLIN-727. --- Resolution: Fixed Issue resolved by pull request #2594 [https://github.com/apache/incubator-gobblin/pull/2594] > Skip commit in CloseOnFlushWriterWrapper if a commit has already been invoked > on the underlying writer. > --- > > Key: GOBBLIN-727 > URL: https://issues.apache.org/jira/browse/GOBBLIN-727 > Project: Apache Gobblin > Issue Type: Bug > Components: gobblin-core >Affects Versions: 0.15.0 >Reporter: Sudarshan Vasudevan >Assignee: Abhishek Tiwari >Priority: Major > Fix For: 0.15.0 > > Time Spent: 1h > Remaining Estimate: 0h > > We skip commit() on the underlying writer if a commit has been previously > invoked. Currently, duplicate commits on the underlying writer can result in > Exceptions due to non-existing data in task staging location (as in the case > of an FsDataWriter). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GOBBLIN-737) Add support for Helix quota-based task scheduling
Hung Tran created GOBBLIN-737: - Summary: Add support for Helix quota-based task scheduling Key: GOBBLIN-737 URL: https://issues.apache.org/jira/browse/GOBBLIN-737 Project: Apache Gobblin Issue Type: Task Reporter: Hung Tran Assignee: Hung Tran Support configuring Helix quota-based task scheduling through gobblin cluster configuration. The gobblin cluster config key "gobblin.cluster.helixTaskQuotaConfig" is added to store a value in the format "quota_type1:quota_value1,quota_type2:quota_value2,...". The config values are parsed and propagated to the Helix cluster configuration. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GOBBLIN-736) Skip flush and control message handlers on closed writers in the CloseOnFlushWriterWrapper
Hung Tran created GOBBLIN-736: - Summary: Skip flush and control message handlers on closed writers in the CloseOnFlushWriterWrapper Key: GOBBLIN-736 URL: https://issues.apache.org/jira/browse/GOBBLIN-736 Project: Apache Gobblin Issue Type: Task Reporter: Hung Tran Assignee: Hung Tran The CloseOnFlushWriterWrapper should not operate on the underlying writer after it is closed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (GOBBLIN-737) Add support for Helix quota-based task scheduling
[ https://issues.apache.org/jira/browse/GOBBLIN-737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hung Tran resolved GOBBLIN-737. --- Resolution: Fixed Fix Version/s: 0.15.0 Issue resolved by pull request #2604 [https://github.com/apache/incubator-gobblin/pull/2604] > Add support for Helix quota-based task scheduling > - > > Key: GOBBLIN-737 > URL: https://issues.apache.org/jira/browse/GOBBLIN-737 > Project: Apache Gobblin > Issue Type: Task >Reporter: Hung Tran >Assignee: Hung Tran >Priority: Major > Fix For: 0.15.0 > > Time Spent: 20m > Remaining Estimate: 0h > > Support configuring Helix quota-based task scheduling through gobblin cluster > configuration. The gobblin cluster config key > "gobblin.cluster.helixTaskQuotaConfig" is added to store a value in the > format "quota_type1:quota_value1,quota_type2:quota_value2,...". The config > values are parsed and propagated to the Helix cluster configuration. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (GOBBLIN-739) Add a way to propagate the Azkaban job config to Gobblin on YARN
[ https://issues.apache.org/jira/browse/GOBBLIN-739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hung Tran resolved GOBBLIN-739. --- Resolution: Fixed Fix Version/s: 0.15.0 Issue resolved by pull request #2606 [https://github.com/apache/incubator-gobblin/pull/2606] > Add a way to propagate the Azkaban job config to Gobblin on YARN > > > Key: GOBBLIN-739 > URL: https://issues.apache.org/jira/browse/GOBBLIN-739 > Project: Apache Gobblin > Issue Type: Task >Reporter: Hung Tran >Assignee: Hung Tran >Priority: Major > Fix For: 0.15.0 > > Time Spent: 20m > Remaining Estimate: 0h > > The AzkabanGobblinYarnAppLauncher can be used to launch a Gobblin application > master on YARN, which then loads configuration from an application.conf file. > Currently, the application.conf is pre-generated and packaged with the > Azkaban job zip. This results in duplication of config between the Azkaban > job properties and the application.conf file. It also doesn't allow user > overrides in the Azkaban UI to be propagated to the app master and containers. > A config should be added to specify an output path to write the Azkaban job > config to in HOCON format. The gobblin yarn config such as > gobblin.yarn.app.master.files.local and gobblin.yarn.container.files.local > can be set to point to the output file. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GOBBLIN-732) Pass UGI credentials to the app master and load dynamic config in workers
Hung Tran created GOBBLIN-732: - Summary: Pass UGI credentials to the app master and load dynamic config in workers Key: GOBBLIN-732 URL: https://issues.apache.org/jira/browse/GOBBLIN-732 Project: Apache Gobblin Issue Type: Task Reporter: Hung Tran Assignee: Hung Tran Credentials available in the Azkaban application launcher need to be passed to the Gobblin application master for distribution to the workers. The workers also need to load dynamic config based on the credentials. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (GOBBLIN-729) Add version strategy support for HiveDataset copy
[ https://issues.apache.org/jira/browse/GOBBLIN-729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hung Tran resolved GOBBLIN-729. --- Resolution: Fixed Fix Version/s: 0.15.0 Issue resolved by pull request #2596 [https://github.com/apache/incubator-gobblin/pull/2596] > Add version strategy support for HiveDataset copy > - > > Key: GOBBLIN-729 > URL: https://issues.apache.org/jira/browse/GOBBLIN-729 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Kuai Yu >Priority: Major > Fix For: 0.15.0 > > Time Spent: 50m > Remaining Estimate: 0h > > This PR will add data strategy support for Hive dataset copy. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (GOBBLIN-733) Instrument Avro Converters to allow converter metrics emission in both batch and streaming modes
[ https://issues.apache.org/jira/browse/GOBBLIN-733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hung Tran resolved GOBBLIN-733. --- Resolution: Fixed Issue resolved by pull request #2600 [https://github.com/apache/incubator-gobblin/pull/2600] > Instrument Avro Converters to allow converter metrics emission in both batch > and streaming modes > > > Key: GOBBLIN-733 > URL: https://issues.apache.org/jira/browse/GOBBLIN-733 > Project: Apache Gobblin > Issue Type: Improvement > Components: gobblin-core >Affects Versions: 0.15.0 >Reporter: Sudarshan Vasudevan >Assignee: Abhishek Tiwari >Priority: Major > Fix For: 0.15.0 > > Time Spent: 20m > Remaining Estimate: 0h > > We change Avro converters to extend InstrumentedConverters which will allow > converter metrics to be emitted in both batch and streaming modes of > execution. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (GOBBLIN-732) Pass UGI credentials to the app master and load dynamic config in workers
[ https://issues.apache.org/jira/browse/GOBBLIN-732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hung Tran resolved GOBBLIN-732. --- Resolution: Fixed Fix Version/s: 0.15.0 Issue resolved by pull request #2599 [https://github.com/apache/incubator-gobblin/pull/2599] > Pass UGI credentials to the app master and load dynamic config in workers > - > > Key: GOBBLIN-732 > URL: https://issues.apache.org/jira/browse/GOBBLIN-732 > Project: Apache Gobblin > Issue Type: Task >Reporter: Hung Tran >Assignee: Hung Tran >Priority: Major > Fix For: 0.15.0 > > Time Spent: 20m > Remaining Estimate: 0h > > Credentials available in the Azkaban application launcher need to be passed > to the Gobblin application master for distribution to the workers. The > workers also need to load dynamic config based on the credentials. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (GOBBLIN-770) Add JVM configuration to avoid exhausting YARN container memory
[ https://issues.apache.org/jira/browse/GOBBLIN-770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hung Tran resolved GOBBLIN-770. --- Resolution: Fixed Fix Version/s: 0.15.0 Issue resolved by pull request #2634 [https://github.com/apache/incubator-gobblin/pull/2634] > Add JVM configuration to avoid exhausting YARN container memory > > > Key: GOBBLIN-770 > URL: https://issues.apache.org/jira/browse/GOBBLIN-770 > Project: Apache Gobblin > Issue Type: Task >Reporter: Hung Tran >Assignee: Hung Tran >Priority: Major > Fix For: 0.15.0 > > Time Spent: 10m > Remaining Estimate: 0h > > The current code sets Xmx to the value of the YARN container memory limit. > The JVM is highly likely to hit the container memory limit with this > configuration due to overhead costs that are not in the JVM heap. > Configuration should be added to set JVM memory as a percentage of the > container memory minus a configurable overhead. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (GOBBLIN-791) Fix hanging stream on error in asynchronous execution model
[ https://issues.apache.org/jira/browse/GOBBLIN-791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hung Tran resolved GOBBLIN-791. --- Resolution: Fixed Fix Version/s: 0.15.0 Issue resolved by pull request #2659 [https://github.com/apache/incubator-gobblin/pull/2659] > Fix hanging stream on error in asynchronous execution model > --- > > Key: GOBBLIN-791 > URL: https://issues.apache.org/jira/browse/GOBBLIN-791 > Project: Apache Gobblin > Issue Type: Task >Reporter: Hung Tran >Priority: Major > Fix For: 0.15.0 > > Time Spent: 20m > Remaining Estimate: 0h > > The asynchronous task execution model uses ReactiveX streams with a > ConnectableFlowable. This is a hot flowable, so it does not terminate when > all subscribers have exited. This results in the extractor continuing to emit > records after downstream constructs have exited due to an error. This is very > problematic for extractors that introduce waits on control message acks since > the extractor may hang. > Another issue is the errors do not propagate upwards, so errors in the writer > do not fail the fork. Change the state of the fork onCancel() to a failure > state so that the task gets failed. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (GOBBLIN-783) Fix the double referencing issue for job type config
[ https://issues.apache.org/jira/browse/GOBBLIN-783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hung Tran resolved GOBBLIN-783. --- Resolution: Fixed Fix Version/s: 0.15.0 Issue resolved by pull request #2646 [https://github.com/apache/incubator-gobblin/pull/2646] > Fix the double referencing issue for job type config > > > Key: GOBBLIN-783 > URL: https://issues.apache.org/jira/browse/GOBBLIN-783 > Project: Apache Gobblin > Issue Type: Bug >Reporter: Kuai Yu >Priority: Major > Fix For: 0.15.0 > > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (GOBBLIN-766) Emit Workunits created event in Apache gobblin
[ https://issues.apache.org/jira/browse/GOBBLIN-766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hung Tran resolved GOBBLIN-766. --- Resolution: Fixed Fix Version/s: 0.15.0 Issue resolved by pull request #2636 [https://github.com/apache/incubator-gobblin/pull/2636] > Emit Workunits created event in Apache gobblin > - > > Key: GOBBLIN-766 > URL: https://issues.apache.org/jira/browse/GOBBLIN-766 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: kraman >Priority: Minor > Fix For: 0.15.0 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > Emit a new workunits created metric to be captured for monitoring/Alerting -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GOBBLIN-787) Add an option to include the task start time in the output file name
Hung Tran created GOBBLIN-787: - Summary: Add an option to include the task start time in the output file name Key: GOBBLIN-787 URL: https://issues.apache.org/jira/browse/GOBBLIN-787 Project: Apache Gobblin Issue Type: Task Reporter: Hung Tran Assignee: Hung Tran In some cases a task may be scheduled to run on multiple workers. One case where this happens is when running with the Helix task execution framework. Helix may reschedule a task on a different worker if it loses contact with a worker. That worker may continue executing for some time before the task is terminated. During this period if the output file names collide then there may be an error during data publish. Add an option "writer.addTaskTimestamp" that can be used to reduce the chance of name collisions by appending a task startup timestamp to the file name. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GOBBLIN-791) Fix hanging stream on error in asynchronous execution model
Hung Tran created GOBBLIN-791: - Summary: Fix hanging stream on error in asynchronous execution model Key: GOBBLIN-791 URL: https://issues.apache.org/jira/browse/GOBBLIN-791 Project: Apache Gobblin Issue Type: Task Reporter: Hung Tran The asynchronous task execution model uses ReactiveX streams with a ConnectableFlowable. This is a hot flowable, so it does not terminate when all subscribers have exited. This results in the extractor continuing to emit records after downstream constructs have exited due to an error. This is very problematic for extractors that introduce waits on control message acks since the extractor may hang. Another issue is the errors do not propagate upwards, so errors in the writer do not fail the fork. Change the state of the fork onCancel() to a failure state so that the task gets failed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GOBBLIN-798) Cleanup workflows from Helix when the Gobblin application master starts
Hung Tran created GOBBLIN-798: - Summary: Cleanup workflows from Helix when the Gobblin application master starts Key: GOBBLIN-798 URL: https://issues.apache.org/jira/browse/GOBBLIN-798 Project: Apache Gobblin Issue Type: Task Reporter: Hung Tran Assignee: Hung Tran If the application master aborts a new one may be spawned by YARN. The second application master will resubmit the jobs. This results in duplicate jobs in Helix and multiple instances of the job may run, resulting in duplicate data. The Gobblin application master should clean up all workflows on startup to avoid executing multiple instances of a job. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (GOBBLIN-798) Clean up workflows from Helix when the Gobblin application master starts
[ https://issues.apache.org/jira/browse/GOBBLIN-798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hung Tran updated GOBBLIN-798: -- Summary: Clean up workflows from Helix when the Gobblin application master starts (was: Cleanup workflows from Helix when the Gobblin application master starts) > Clean up workflows from Helix when the Gobblin application master starts > > > Key: GOBBLIN-798 > URL: https://issues.apache.org/jira/browse/GOBBLIN-798 > Project: Apache Gobblin > Issue Type: Task >Reporter: Hung Tran >Assignee: Hung Tran >Priority: Major > > If the application master aborts a new one may be spawned by YARN. The second > application master will resubmit the jobs. This results in duplicate jobs in > Helix and multiple instances of the job may run, resulting in duplicate data. > The Gobblin application master should clean up all workflows on startup to > avoid executing multiple instances of a job. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (GOBBLIN-780) Handle scenarios that cause the YarnAutoScalingManager to be stuck
[ https://issues.apache.org/jira/browse/GOBBLIN-780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hung Tran resolved GOBBLIN-780. --- Resolution: Fixed Fix Version/s: 0.15.0 Issue resolved by pull request #2644 [https://github.com/apache/incubator-gobblin/pull/2644] > Handle scenarios that cause the YarnAutoScalingManager to be stuck > -- > > Key: GOBBLIN-780 > URL: https://issues.apache.org/jira/browse/GOBBLIN-780 > Project: Apache Gobblin > Issue Type: Task >Reporter: Hung Tran >Priority: Major > Fix For: 0.15.0 > > Time Spent: 10m > Remaining Estimate: 0h > > Issue 1: The YarnAutoScalingRunnable is run in a fixed schedule by a > ScheduledExecutorService in YarnAutoScalingManager. If the runnable > encounters an exception the the executor service will stop scheduling it. > Catch all exceptions in the runnable, log, and do not re-raise. > Issue 2: The auto scaler may reduce the container count to 0. Helix will not > schedule any flows if there are no participants connected. This results in > the auto scaler keeping the container count at 0 and no progress is made. Fix > this by not allowing the container count to be reduced below 1. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (GOBBLIN-787) Add an option to include the task start time in the output file name
[ https://issues.apache.org/jira/browse/GOBBLIN-787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hung Tran resolved GOBBLIN-787. --- Resolution: Fixed Fix Version/s: 0.15.0 Issue resolved by pull request #2653 [https://github.com/apache/incubator-gobblin/pull/2653] > Add an option to include the task start time in the output file name > > > Key: GOBBLIN-787 > URL: https://issues.apache.org/jira/browse/GOBBLIN-787 > Project: Apache Gobblin > Issue Type: Task >Reporter: Hung Tran >Assignee: Hung Tran >Priority: Major > Fix For: 0.15.0 > > Time Spent: 1.5h > Remaining Estimate: 0h > > In some cases a task may be scheduled to run on multiple workers. One case > where this happens is when running with the Helix task execution framework. > Helix may reschedule a task on a different worker if it loses contact with a > worker. That worker may continue executing for some time before the task is > terminated. During this period if the output file names collide then there > may be an error during data publish. > Add an option "writer.addTaskTimestamp" that can be used to reduce the chance > of name collisions by appending a task startup timestamp to the file name. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (GOBBLIN-800) Remove the metric context cache from GobblinMetricsRegistry
[ https://issues.apache.org/jira/browse/GOBBLIN-800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hung Tran resolved GOBBLIN-800. --- Resolution: Fixed Fix Version/s: 0.15.0 Issue resolved by pull request #2667 [https://github.com/apache/incubator-gobblin/pull/2667] > Remove the metric context cache from GobblinMetricsRegistry > --- > > Key: GOBBLIN-800 > URL: https://issues.apache.org/jira/browse/GOBBLIN-800 > Project: Apache Gobblin > Issue Type: Bug >Reporter: Kuai Yu >Priority: Major > Fix For: 0.15.0 > > Time Spent: 2h > Remaining Estimate: 0h > > Remove the metric context cache from GobblinMetricsRegistry -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (GOBBLIN-813) Make SFDC connector support encrypted Salesforce client id and client secret
[ https://issues.apache.org/jira/browse/GOBBLIN-813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hung Tran resolved GOBBLIN-813. --- Resolution: Fixed Fix Version/s: 0.15.0 Issue resolved by pull request #2677 [https://github.com/apache/incubator-gobblin/pull/2677] > Make SFDC connector support encrypted Salesforce client id and client secret > > > Key: GOBBLIN-813 > URL: https://issues.apache.org/jira/browse/GOBBLIN-813 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Kuai Yu >Priority: Major > Fix For: 0.15.0 > > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (GOBBLIN-799) Bugs in AvroSchemaCheckDefaultStrategy that not return after check ENUM and FIXED
[ https://issues.apache.org/jira/browse/GOBBLIN-799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hung Tran resolved GOBBLIN-799. --- Resolution: Fixed Fix Version/s: 0.15.0 Issue resolved by pull request #2666 [https://github.com/apache/incubator-gobblin/pull/2666] > Bugs in AvroSchemaCheckDefaultStrategy that not return after check ENUM and > FIXED > -- > > Key: GOBBLIN-799 > URL: https://issues.apache.org/jira/browse/GOBBLIN-799 > Project: Apache Gobblin > Issue Type: Bug >Reporter: Zihan Li >Priority: Minor > Fix For: 0.15.0 > > > There are bugs in AvroSchemaCheckDefaultStrategy that not return after check > ENUM and FIXED, just need to add return statement -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (GOBBLIN-798) Clean up workflows from Helix when the Gobblin application master starts
[ https://issues.apache.org/jira/browse/GOBBLIN-798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hung Tran resolved GOBBLIN-798. --- Resolution: Fixed Fix Version/s: 0.15.0 Issue resolved by pull request #2665 [https://github.com/apache/incubator-gobblin/pull/2665] > Clean up workflows from Helix when the Gobblin application master starts > > > Key: GOBBLIN-798 > URL: https://issues.apache.org/jira/browse/GOBBLIN-798 > Project: Apache Gobblin > Issue Type: Task >Reporter: Hung Tran >Assignee: Hung Tran >Priority: Major > Fix For: 0.15.0 > > Time Spent: 1h 50m > Remaining Estimate: 0h > > If the application master aborts a new one may be spawned by YARN. The second > application master will resubmit the jobs. This results in duplicate jobs in > Helix and multiple instances of the job may run, resulting in duplicate data. > The Gobblin application master should clean up all workflows on startup to > avoid executing multiple instances of a job. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (GOBBLIN-767) Support different time units in TimeBasedWriterPartitioner
[ https://issues.apache.org/jira/browse/GOBBLIN-767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hung Tran resolved GOBBLIN-767. --- Resolution: Fixed Fix Version/s: 0.15.0 Issue resolved by pull request #2630 [https://github.com/apache/incubator-gobblin/pull/2630] > Support different time units in TimeBasedWriterPartitioner > -- > > Key: GOBBLIN-767 > URL: https://issues.apache.org/jira/browse/GOBBLIN-767 > Project: Apache Gobblin > Issue Type: Task >Reporter: Zhixiong Chen >Assignee: Zhixiong Chen >Priority: Major > Fix For: 0.15.0 > > Time Spent: 50m > Remaining Estimate: 0h > > Currently, `TimeBasedWriterPartitioner` assumes the timestamp value from a > record is in millis. The task is to remove the assumption and support > timestamp in different units, by default, in millis. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (GOBBLIN-769) Support string record timestamp in TimeBasedAvroWriterPartitioner
[ https://issues.apache.org/jira/browse/GOBBLIN-769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hung Tran resolved GOBBLIN-769. --- Resolution: Fixed Fix Version/s: 0.15.0 Issue resolved by pull request #2632 [https://github.com/apache/incubator-gobblin/pull/2632] > Support string record timestamp in TimeBasedAvroWriterPartitioner > - > > Key: GOBBLIN-769 > URL: https://issues.apache.org/jira/browse/GOBBLIN-769 > Project: Apache Gobblin > Issue Type: Task >Reporter: Zhixiong Chen >Assignee: Zhixiong Chen >Priority: Major > Fix For: 0.15.0 > > Time Spent: 1h > Remaining Estimate: 0h > > Currently, if a record timestamp is a string, > `TimeBasedAvroWriterPartitioner` will not be able to recognize it and will > use current time -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GOBBLIN-770) Add JVM configuration to avoid exhausting YARN container memory
Hung Tran created GOBBLIN-770: - Summary: Add JVM configuration to avoid exhausting YARN container memory Key: GOBBLIN-770 URL: https://issues.apache.org/jira/browse/GOBBLIN-770 Project: Apache Gobblin Issue Type: Task Reporter: Hung Tran The current code sets Xmx to the value of the YARN container memory limit. The JVM is highly likely to hit the container memory limit with this configuration due to overhead costs that are not in the JVM heap. Configuration should be added to set JVM memory as a percentage of the container memory minus a configurable overhead. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (GOBBLIN-770) Add JVM configuration to avoid exhausting YARN container memory
[ https://issues.apache.org/jira/browse/GOBBLIN-770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hung Tran reassigned GOBBLIN-770: - Assignee: Hung Tran > Add JVM configuration to avoid exhausting YARN container memory > > > Key: GOBBLIN-770 > URL: https://issues.apache.org/jira/browse/GOBBLIN-770 > Project: Apache Gobblin > Issue Type: Task >Reporter: Hung Tran >Assignee: Hung Tran >Priority: Major > > The current code sets Xmx to the value of the YARN container memory limit. > The JVM is highly likely to hit the container memory limit with this > configuration due to overhead costs that are not in the JVM heap. > Configuration should be added to set JVM memory as a percentage of the > container memory minus a configurable overhead. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GOBBLIN-777) Remove container request after container allocation
Hung Tran created GOBBLIN-777: - Summary: Remove container request after container allocation Key: GOBBLIN-777 URL: https://issues.apache.org/jira/browse/GOBBLIN-777 Project: Apache Gobblin Issue Type: Task Reporter: Hung Tran Assignee: Hung Tran Due to YARN-1902, a request for containers may allocate more containers than desired since the requests are not automatically removed when a container is allocated. The Gobblin YarnService needs to work around this issue by removing a matching container request in the container allocation callback. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GOBBLIN-780) Handle scenarios that causes the YarnAutoScalingManager to be stuck
Hung Tran created GOBBLIN-780: - Summary: Handle scenarios that causes the YarnAutoScalingManager to be stuck Key: GOBBLIN-780 URL: https://issues.apache.org/jira/browse/GOBBLIN-780 Project: Apache Gobblin Issue Type: Task Reporter: Hung Tran Issue 1: The YarnAutoScalingRunnable is run in a fixed schedule by a ScheduledExecutorService in YarnAutoScalingManager. If the runnable encounters an exception the the executor service will stop scheduling it. Catch all exceptions in the runnable, log, and do not re-raise. Issue 2: The auto scaler may reduce the container count to 0. Helix will not schedule any flows if there are no participants connected. This results in the auto scaler keeping the container count at 0 and no progress is made. Fix this by not allowing the container count to be reduced below 1. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (GOBBLIN-780) Handle scenarios that cause the YarnAutoScalingManager to be stuck
[ https://issues.apache.org/jira/browse/GOBBLIN-780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hung Tran updated GOBBLIN-780: -- Summary: Handle scenarios that cause the YarnAutoScalingManager to be stuck (was: Handle scenarios that causes the YarnAutoScalingManager to be stuck) > Handle scenarios that cause the YarnAutoScalingManager to be stuck > -- > > Key: GOBBLIN-780 > URL: https://issues.apache.org/jira/browse/GOBBLIN-780 > Project: Apache Gobblin > Issue Type: Task >Reporter: Hung Tran >Priority: Major > > Issue 1: The YarnAutoScalingRunnable is run in a fixed schedule by a > ScheduledExecutorService in YarnAutoScalingManager. If the runnable > encounters an exception the the executor service will stop scheduling it. > Catch all exceptions in the runnable, log, and do not re-raise. > Issue 2: The auto scaler may reduce the container count to 0. Helix will not > schedule any flows if there are no participants connected. This results in > the auto scaler keeping the container count at 0 and no progress is made. Fix > this by not allowing the container count to be reduced below 1. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (GOBBLIN-763) Support fields removal for compaction dedup key schema
[ https://issues.apache.org/jira/browse/GOBBLIN-763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hung Tran resolved GOBBLIN-763. --- Resolution: Fixed Fix Version/s: 0.15.0 Issue resolved by pull request #2627 [https://github.com/apache/incubator-gobblin/pull/2627] > Support fields removal for compaction dedup key schema > -- > > Key: GOBBLIN-763 > URL: https://issues.apache.org/jira/browse/GOBBLIN-763 > Project: Apache Gobblin > Issue Type: Task >Reporter: Zhixiong Chen >Assignee: Zhixiong Chen >Priority: Major > Fix For: 0.15.0 > > Time Spent: 40m > Remaining Estimate: 0h > > - Remove fields, specified by configuration > `compaction.job.key.fieldBlacklist`, while computing compaction dedup key > schema > - Fix incorrect `AvroUtils.removeUncomparableFields` implementation, which > only keeps the first field of any schema, dropping all other fields which > have the same schema. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (GOBBLIN-777) Remove container request after container allocation
[ https://issues.apache.org/jira/browse/GOBBLIN-777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hung Tran resolved GOBBLIN-777. --- Resolution: Fixed Fix Version/s: 0.15.0 Issue resolved by pull request #2641 [https://github.com/apache/incubator-gobblin/pull/2641] > Remove container request after container allocation > --- > > Key: GOBBLIN-777 > URL: https://issues.apache.org/jira/browse/GOBBLIN-777 > Project: Apache Gobblin > Issue Type: Task >Reporter: Hung Tran >Assignee: Hung Tran >Priority: Major > Fix For: 0.15.0 > > Time Spent: 1h > Remaining Estimate: 0h > > Due to YARN-1902, a request for containers may allocate more containers than > desired since the requests are not automatically removed when a container is > allocated. > The Gobblin YarnService needs to work around this issue by removing a > matching container request in the container allocation callback. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (GOBBLIN-762) Add automatic scaling for Gobblin on YARN
[ https://issues.apache.org/jira/browse/GOBBLIN-762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hung Tran resolved GOBBLIN-762. --- Resolution: Fixed Fix Version/s: 0.15.0 Issue resolved by pull request #2626 [https://github.com/apache/incubator-gobblin/pull/2626] > Add automatic scaling for Gobblin on YARN > - > > Key: GOBBLIN-762 > URL: https://issues.apache.org/jira/browse/GOBBLIN-762 > Project: Apache Gobblin > Issue Type: Task >Reporter: Hung Tran >Priority: Major > Fix For: 0.15.0 > > Time Spent: 3.5h > Remaining Estimate: 0h > > Gobblin on YARN needs a way to scale up and down the containers based on the > workload. > Added `YarnAutoScalingManager` which can be started by the > `GobblinApplicationMaster` by setting the > `gobblin.yarn.app.master.serviceClasses` configuration. This class runs a > scheduled task with a default interval of 60 seconds to detect the number of > required partitions for the workflows submitted to Helix. It will request the > `YarnService` to scale to a computed number of containers. If the requested > number of containers is higher than the YarnService has previously requested > then it will request more containers. If the requested count is less than the > current number of allocated containers then it will free any unused > containers. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GOBBLIN-774) Send nack when a control message handler fails in Fork
Hung Tran created GOBBLIN-774: - Summary: Send nack when a control message handler fails in Fork Key: GOBBLIN-774 URL: https://issues.apache.org/jira/browse/GOBBLIN-774 Project: Apache Gobblin Issue Type: Task Reporter: Hung Tran Assignee: Hung Tran Fork will raise an error without ack/nacking if the control message handler raises an error. This can result in another thread waiting indefinitely for a control message ack. Fork. consumeRecordStream() should handle control message exceptions by calling nack() with the exception before reraising the error. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (GOBBLIN-774) Send nack when a control message handler fails in Fork
[ https://issues.apache.org/jira/browse/GOBBLIN-774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hung Tran resolved GOBBLIN-774. --- Resolution: Fixed Fix Version/s: 0.15.0 Issue resolved by pull request #2639 [https://github.com/apache/incubator-gobblin/pull/2639] > Send nack when a control message handler fails in Fork > -- > > Key: GOBBLIN-774 > URL: https://issues.apache.org/jira/browse/GOBBLIN-774 > Project: Apache Gobblin > Issue Type: Task >Reporter: Hung Tran >Assignee: Hung Tran >Priority: Major > Fix For: 0.15.0 > > Time Spent: 10m > Remaining Estimate: 0h > > Fork will raise an error without ack/nacking if the control message handler > raises an error. This can result in another thread waiting indefinitely for a > control message ack. Fork. > consumeRecordStream() should handle control message exceptions by calling > nack() with the exception before reraising the error. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (GOBBLIN-761) Fix runtime property like Topic.name not available in Compaction when fetching configStore object
[ https://issues.apache.org/jira/browse/GOBBLIN-761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hung Tran resolved GOBBLIN-761. --- Resolution: Fixed Fix Version/s: 0.15.0 Issue resolved by pull request #2625 [https://github.com/apache/incubator-gobblin/pull/2625] > Fix runtime property like Topic.name not available in Compaction when > fetching configStore object > - > > Key: GOBBLIN-761 > URL: https://issues.apache.org/jira/browse/GOBBLIN-761 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Lei Sun >Priority: Major > Fix For: 0.15.0 > > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (GOBBLIN-764) Allow passing of rest.li parameters to throttling client
[ https://issues.apache.org/jira/browse/GOBBLIN-764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hung Tran resolved GOBBLIN-764. --- Resolution: Fixed Fix Version/s: 0.15.0 Issue resolved by pull request #2628 [https://github.com/apache/incubator-gobblin/pull/2628] > Allow passing of rest.li parameters to throttling client > > > Key: GOBBLIN-764 > URL: https://issues.apache.org/jira/browse/GOBBLIN-764 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Issac Buenrostro >Assignee: Issac Buenrostro >Priority: Major > Fix For: 0.15.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GOBBLIN-743) Initialize Gobblin application master services with dynamic config
Hung Tran created GOBBLIN-743: - Summary: Initialize Gobblin application master services with dynamic config Key: GOBBLIN-743 URL: https://issues.apache.org/jira/browse/GOBBLIN-743 Project: Apache Gobblin Issue Type: Task Reporter: Hung Tran Assignee: Hung Tran The Gobblin application manager needs to initialize services with the config generated by the dynamic config generator. One use case that requires this is the passing of SSL configuration to kafka consumers and producers. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (GOBBLIN-726) Enable Schema Verification During Primary Dataset Deployment
[ https://issues.apache.org/jira/browse/GOBBLIN-726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hung Tran resolved GOBBLIN-726. --- Resolution: Fixed Fix Version/s: 0.15.0 Issue resolved by pull request #2593 [https://github.com/apache/incubator-gobblin/pull/2593] > Enable Schema Verification During Primary Dataset Deployment > > > Key: GOBBLIN-726 > URL: https://issues.apache.org/jira/browse/GOBBLIN-726 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Zihan Li >Priority: Major > Fix For: 0.15.0 > > > Each distcp mapper will first read the schema of the file to be copied, and > abort if the file schema does not match the expected schema. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (GOBBLIN-762) Add automatic scaling for Gobblin on YARN
[ https://issues.apache.org/jira/browse/GOBBLIN-762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hung Tran updated GOBBLIN-762: -- Description: Gobblin on YARN needs a way to scale up and down the containers based on the workload. Added `YarnAutoScalingManager` which can be started by the `GobblinApplicationMaster` by setting the `gobblin.yarn.app.master.serviceClasses` configuration. This class runs a scheduled task with a default interval of 60 seconds to detect the number of required partitions for the workflows submitted to Helix. It will request the `YarnService` to scale to a computed number of containers. If the requested number of containers is higher than the YarnService has previously requested then it will request more containers. If the requested count is less than the current number of allocated containers then it will free any unused containers. was:Gobblin on YARN needs a way to scale up and down the containers based on the workload. > Add automatic scaling for Gobblin on YARN > - > > Key: GOBBLIN-762 > URL: https://issues.apache.org/jira/browse/GOBBLIN-762 > Project: Apache Gobblin > Issue Type: Task >Reporter: Hung Tran >Priority: Major > > Gobblin on YARN needs a way to scale up and down the containers based on the > workload. > Added `YarnAutoScalingManager` which can be started by the > `GobblinApplicationMaster` by setting the > `gobblin.yarn.app.master.serviceClasses` configuration. This class runs a > scheduled task with a default interval of 60 seconds to detect the number of > required partitions for the workflows submitted to Helix. It will request the > `YarnService` to scale to a computed number of containers. If the requested > number of containers is higher than the YarnService has previously requested > then it will request more containers. If the requested count is less than the > current number of allocated containers then it will free any unused > containers. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GOBBLIN-762) Add automatic scaling for Gobblin on YARN
Hung Tran created GOBBLIN-762: - Summary: Add automatic scaling for Gobblin on YARN Key: GOBBLIN-762 URL: https://issues.apache.org/jira/browse/GOBBLIN-762 Project: Apache Gobblin Issue Type: Task Reporter: Hung Tran Gobblin on YARN needs a way to scale up and down the containers based on the workload. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (GOBBLIN-821) Create Code Coverage Report for Gobblin
[ https://issues.apache.org/jira/browse/GOBBLIN-821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hung Tran resolved GOBBLIN-821. --- Resolution: Fixed Fix Version/s: 0.15.0 Issue resolved by pull request #2684 [https://github.com/apache/incubator-gobblin/pull/2684] > Create Code Coverage Report for Gobblin > --- > > Key: GOBBLIN-821 > URL: https://issues.apache.org/jira/browse/GOBBLIN-821 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Lei Sun >Priority: Major > Fix For: 0.15.0 > > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Resolved] (GOBBLIN-738) Open a way to customize decoding KafkaConsumerRecord
[ https://issues.apache.org/jira/browse/GOBBLIN-738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hung Tran resolved GOBBLIN-738. --- Resolution: Fixed Fix Version/s: 0.15.0 Issue resolved by pull request #2605 [https://github.com/apache/incubator-gobblin/pull/2605] > Open a way to customize decoding KafkaConsumerRecord > > > Key: GOBBLIN-738 > URL: https://issues.apache.org/jira/browse/GOBBLIN-738 > Project: Apache Gobblin > Issue Type: Task >Reporter: Zhixiong Chen >Priority: Major > Fix For: 0.15.0 > > Time Spent: 10m > Remaining Estimate: 0h > > Currently, decoding a `KafkaConsumerRecord` is limited to 2 forms: > - decode as a `ByteArrayBasedKafkaRecord` message > - convert value from a `DecodeableKafkaRecord` message > The task is to open a way for arbitrary decoding mechanism -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (GOBBLIN-725) add a mysql based job-status store
[ https://issues.apache.org/jira/browse/GOBBLIN-725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hung Tran resolved GOBBLIN-725. --- Resolution: Fixed Fix Version/s: 0.15.0 Issue resolved by pull request #2592 [https://github.com/apache/incubator-gobblin/pull/2592] > add a mysql based job-status store > -- > > Key: GOBBLIN-725 > URL: https://issues.apache.org/jira/browse/GOBBLIN-725 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Arjun Singh Bora >Priority: Major > Fix For: 0.15.0 > > Time Spent: 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (GOBBLIN-719) gobblin-docs has invalid git links
[ https://issues.apache.org/jira/browse/GOBBLIN-719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hung Tran resolved GOBBLIN-719. --- Resolution: Fixed Fix Version/s: 0.15.0 Issue resolved by pull request #2586 [https://github.com/apache/incubator-gobblin/pull/2586] > gobblin-docs has invalid git links > -- > > Key: GOBBLIN-719 > URL: https://issues.apache.org/jira/browse/GOBBLIN-719 > Project: Apache Gobblin > Issue Type: Bug >Reporter: Jay Sen >Priority: Trivial > Fix For: 0.15.0 > > Time Spent: 1h > Remaining Estimate: 0h > > gobblin docs had some invalid links pointing not only LinkedIn repo but also > old location of the classes that has changes since then. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GOBBLIN-739) Add a way to propagate the Azkaban config to Gobblin on YARN
Hung Tran created GOBBLIN-739: - Summary: Add a way to propagate the Azkaban config to Gobblin on YARN Key: GOBBLIN-739 URL: https://issues.apache.org/jira/browse/GOBBLIN-739 Project: Apache Gobblin Issue Type: Task Reporter: Hung Tran Assignee: Hung Tran The AzkabanGobblinYarnAppLauncher can be used to launch a Gobblin application master on YARN, which then loads configuration from an application.conf file. Currently, the application.conf is pre-generated and packaged with the Azkaban job zip. This results in duplication of config between the Azkaban job properties and the application.conf file. It also doesn't allow user overrides in the Azkaban UI to be propagated to the app master and containers. A config should be added to specify an output path to write the Azkaban job config to in HOCON format. The gobblin yarn config such as gobblin.yarn.app.master.files.local and gobblin.yarn.container.files.local can be set to point to the output file. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (GOBBLIN-739) Add a way to propagate the Azkaban job config to Gobblin on YARN
[ https://issues.apache.org/jira/browse/GOBBLIN-739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hung Tran updated GOBBLIN-739: -- Summary: Add a way to propagate the Azkaban job config to Gobblin on YARN (was: Add a way to propagate the Azkaban config to Gobblin on YARN) > Add a way to propagate the Azkaban job config to Gobblin on YARN > > > Key: GOBBLIN-739 > URL: https://issues.apache.org/jira/browse/GOBBLIN-739 > Project: Apache Gobblin > Issue Type: Task >Reporter: Hung Tran >Assignee: Hung Tran >Priority: Major > > The AzkabanGobblinYarnAppLauncher can be used to launch a Gobblin application > master on YARN, which then loads configuration from an application.conf file. > Currently, the application.conf is pre-generated and packaged with the > Azkaban job zip. This results in duplication of config between the Azkaban > job properties and the application.conf file. It also doesn't allow user > overrides in the Azkaban UI to be propagated to the app master and containers. > A config should be added to specify an output path to write the Azkaban job > config to in HOCON format. The gobblin yarn config such as > gobblin.yarn.app.master.files.local and gobblin.yarn.container.files.local > can be set to point to the output file. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (GOBBLIN-747) Set expected schema when creating workunits
[ https://issues.apache.org/jira/browse/GOBBLIN-747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hung Tran resolved GOBBLIN-747. --- Resolution: Fixed Fix Version/s: 0.15.0 Issue resolved by pull request #2612 [https://github.com/apache/incubator-gobblin/pull/2612] > Set expected schema when creating workunits > --- > > Key: GOBBLIN-747 > URL: https://issues.apache.org/jira/browse/GOBBLIN-747 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Zihan Li >Priority: Major > Fix For: 0.15.0 > > > Set the property of gobblin.copy.expectedSchema when creating the workunit to > enable schema check in distcp. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (GOBBLIN-851) Provide capability to disable hive schema registration in partition level
[ https://issues.apache.org/jira/browse/GOBBLIN-851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hung Tran resolved GOBBLIN-851. --- Resolution: Fixed Fix Version/s: 0.15.0 Issue resolved by pull request #2707 [https://github.com/apache/incubator-gobblin/pull/2707] > Provide capability to disable hive schema registration in partition level > - > > Key: GOBBLIN-851 > URL: https://issues.apache.org/jira/browse/GOBBLIN-851 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Kuai Yu >Priority: Major > Fix For: 0.15.0 > > Time Spent: 2h 10m > Remaining Estimate: 0h > > We had problems when table level schema and partition level schema diverges. > Think about the case when user register two partitions : 2019/08/10, > 2019/08/11, but schema changes in between(S1->S2). Now the table level has > schema S2, but 2019/08/10 will have schema S1. > Query on the latest schema will cause the old partition failure. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Resolved] (GOBBLIN-857) Extending getTopicsFromConfigStore to accept topicName directly
[ https://issues.apache.org/jira/browse/GOBBLIN-857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hung Tran resolved GOBBLIN-857. --- Resolution: Fixed Fix Version/s: 0.15.0 Issue resolved by pull request #2713 [https://github.com/apache/incubator-gobblin/pull/2713] > Extending getTopicsFromConfigStore to accept topicName directly > --- > > Key: GOBBLIN-857 > URL: https://issues.apache.org/jira/browse/GOBBLIN-857 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Lei Sun >Priority: Major > Fix For: 0.15.0 > > Time Spent: 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Resolved] (GOBBLIN-862) Security token encryption support in SFDC connector
[ https://issues.apache.org/jira/browse/GOBBLIN-862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hung Tran resolved GOBBLIN-862. --- Fix Version/s: 0.15.0 Resolution: Fixed Issue resolved by pull request #2718 [https://github.com/apache/incubator-gobblin/pull/2718] > Security token encryption support in SFDC connector > --- > > Key: GOBBLIN-862 > URL: https://issues.apache.org/jira/browse/GOBBLIN-862 > Project: Apache Gobblin > Issue Type: Task > Components: gobblin-salesforce >Reporter: Monish Vachhani >Assignee: Hung Tran >Priority: Major > Fix For: 0.15.0 > > > Security token encryption support in SFDC connector so as not to have > security token as plain text. -- This message was sent by Atlassian Jira (v8.3.2#803003)