[jira] [Resolved] (GOBBLIN-807) TimingEvent to extend GobblinEventBuilder
[ https://issues.apache.org/jira/browse/GOBBLIN-807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Issac Buenrostro resolved GOBBLIN-807. -- Resolution: Fixed Fix Version/s: 0.15.0 Issue resolved by pull request #2678 [https://github.com/apache/incubator-gobblin/pull/2678] > TimingEvent to extend GobblinEventBuilder > -- > > Key: GOBBLIN-807 > URL: https://issues.apache.org/jira/browse/GOBBLIN-807 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Vikram Bohra >Priority: Minor > Fix For: 0.15.0 > > Time Spent: 5h > Remaining Estimate: 0h > > GobblinEventBuilder and its subclasses should be used to build > GobblinTrackingEvents and EventSubmitter should solely be responsible for > submitting events. Depreacate older roles and methods > appropriately.TimingEvent should extend the GobblinEventBuilder while > maintaining older methods for backward compatability. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (GOBBLIN-809) Fix rate based limit to accept double value for rate from config
[ https://issues.apache.org/jira/browse/GOBBLIN-809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Issac Buenrostro resolved GOBBLIN-809. -- Resolution: Fixed Fix Version/s: 0.15.0 Issue resolved by pull request #2676 [https://github.com/apache/incubator-gobblin/pull/2676] > Fix rate based limit to accept double value for rate from config > > > Key: GOBBLIN-809 > URL: https://issues.apache.org/jira/browse/GOBBLIN-809 > Project: Apache Gobblin > Issue Type: Bug > Components: gobblin-azkaban >Reporter: Aman Gupta >Assignee: Abhishek Tiwari >Priority: Major > Fix For: 0.15.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > Change get long to get double instead from config when building > RateBasedLimiter > > https://github.com/apache/incubator-gobblin/blob/7b75aa10e49db2479458aeaf81d0e233af2c186a/gobblin-utility/src/main/java/org/apache/gobblin/util/limiter/RateBasedLimiter.java#L56 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (GOBBLIN-772) Implement Schema Comparison Strategy during Disctp
[ https://issues.apache.org/jira/browse/GOBBLIN-772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Issac Buenrostro resolved GOBBLIN-772. -- Resolution: Fixed Fix Version/s: 0.15.0 Issue resolved by pull request #2637 [https://github.com/apache/incubator-gobblin/pull/2637] > Implement Schema Comparison Strategy during Disctp > -- > > Key: GOBBLIN-772 > URL: https://issues.apache.org/jira/browse/GOBBLIN-772 > Project: Apache Gobblin > Issue Type: Task >Reporter: Zihan Li >Priority: Major > Fix For: 0.15.0 > > Time Spent: 2h 40m > Remaining Estimate: 0h > > We need a schema comparison strategy to make sure the real schema and the > expected schema have matching field names and types. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (GOBBLIN-778) Enhance SalesforceExtractor bulkConnection config for setting transport factory
[ https://issues.apache.org/jira/browse/GOBBLIN-778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Issac Buenrostro resolved GOBBLIN-778. -- Resolution: Fixed Fix Version/s: 0.15.0 Issue resolved by pull request #2642 [https://github.com/apache/incubator-gobblin/pull/2642] > Enhance SalesforceExtractor bulkConnection config for setting transport > factory > --- > > Key: GOBBLIN-778 > URL: https://issues.apache.org/jira/browse/GOBBLIN-778 > Project: Apache Gobblin > Issue Type: Task > Components: gobblin-salesforce >Reporter: Monish Vachhani >Assignee: Hung Tran >Priority: Major > Fix For: 0.15.0 > > Time Spent: 10m > Remaining Estimate: 0h > > SalesforceExtractor uses bulk connection to connect to Salesforce using bulk > API. Since bulkConnection is private variable it cannot be modified to pass > custom transportFactory via config. > This task is to separate the config creation from bulkApiLogin method so as > it can be overridden for passing custom params like setTransport. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (GOBBLIN-707) combine & standardize all gobblin scripts into one master script & restructure configs accordingly
[ https://issues.apache.org/jira/browse/GOBBLIN-707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16832912#comment-16832912 ] Issac Buenrostro commented on GOBBLIN-707: -- I see, didn't realize there was so much added to `gobblin.cli`. Can we do this to avoid confusing what options apply to each mode? {code:java} gobblin --help gobblin cli gobblin service Use "gobblin --help" for more information {code} > combine & standardize all gobblin scripts into one master script & > restructure configs accordingly > -- > > Key: GOBBLIN-707 > URL: https://issues.apache.org/jira/browse/GOBBLIN-707 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Jay Sen >Priority: Major > Time Spent: 5h 40m > Remaining Estimate: 0h > > gobblin supports multiple modes of executions ( CLI, Standalone, > cluster-master, cluster-worker, AWS, YARN, MR ) and various command lines > utility to run cli and admin commands. There is a individual script for each > of them. > Having individual script introduces lot of issues > # all scripts handles gobblin variables, user parameters differently, and > its highly inconsistent among various different gobblin scripts > # functionality around start, stop, status checking and handling PID's among > lot of other things, varies vastly as per the implementation of the script. > # features like GC & JVM params, log4j file selection, classpath > calculation, etc... exists in some gobblin scripts but not all, adding to > inconsistent user experience. > # maintaining total 13 script would be too much effort. > Also all the gobblin scripts share lot of common code to handle params, > start, stop services, status checks, pid handling, etc... combining all the > scripts into 1 not only makes maintenance easier but also brings clarity and > consistency. > > Solution: > 1. there can be one gobblin.sh script to handle all gobblin commands and > deployment options as per following signature. NOTE: This > {{gobblin.sh }} > {{gobblin.sh }} > {{commands values: admin, cli, statestore-check, statestore-clean, > historystore-manager, classpath}} > {{service values: standalone, cluster-master, cluster-worker, aws, yarn, mr, > service}} > with above change, following becomes valid command. > {code:java} > # all under GobblinCli class > gobblin run listQuickApps –> gobblin cli run listQuickApps > gobblin run listQuickApps –> gobblin cli run listQuickApps > gobblin run -> gobblin cli run > # class: JobStateToJsonConverter > statestore-checker.sh -> gobblin statestore-checker > # class: StateStoreCleaner > statestore-clean.sh -> gobblin statestore-clean > # class: DatabaseJobHistoryStoreSchemaManager > historystore-manager.sh -> gobblin historystore-manager > # class: Cli > gobblin-admin.sh-> gobblin admin > # all gobblin deployment modes > gobblin-cluster-master.sh -> gobblin cluster-mater start|stop|status > gobblin-cluster-worker.sh -> gobblin cluster-mater start|stop|status > gobblin-compaction.sh -> gobblin cluster-mater start|stop|status > gobblin-env.sh -> gobblin cluster-mater start|stop|status > gobblin-mapreduce.sh-> gobblin cluster-mater start|stop|status > gobblin-service.sh -> gobblin cluster-mater start|stop|status > gobblin-standalone.sh -> gobblin cluster-mater start|stop|status > gobblin-yarn.sh -> gobblin cluster-mater start|stop|status > {code} > > 2. Also configs needs to be structured and deduped accordingly to make it > clear on which config will be picked up for which execution mode. > {color:#ff} > NOTE: this refactoring adds all cli and service commands to gobblin.sh and > hence changes the syntax for all commands and services.{color} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (GOBBLIN-707) combine & standardize all gobblin scripts into one master script & restructure configs accordingly
[ https://issues.apache.org/jira/browse/GOBBLIN-707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16832080#comment-16832080 ] Issac Buenrostro commented on GOBBLIN-707: -- Thanks for taking this up [~jaysen] I do see the point of cleaning up the multiple scripts that Gobblin has, however I would challenge that the cleanup should be a bit different. As you pointed out there are two types of scripts: commands and services. * For commands, the scripts are always pretty much identical, so I believe the access should always be through `GobblinCli` (i.e. implemented as `CliApplication`s). This means that instead of `gobblin statestore-checker` it should be `gobblin cli statestore-checker` and have the bash portion of the script be unique. This has the advantage that `gobblin cli --help` will list all commands, and commands are self-documenting by using the `@Alias` annotation, and even better if we use `ConstructorAndPublicMethodsCliObjectFactory` which will automatically create a help string for each one, and allow programmatic and cli access with the same input. * For services, I'm not sure how you're approaching things, but it would also be nice to have a single bash script that can handle all of them (given that, as you pointed out, they are all of the form `start|stop|status`). Re: the PR, I'm a bit confused because a lot of scripts were removed but I don't understand where the replacements are. I may be missing something obvious, and I apologize if that is the case :) > combine & standardize all gobblin scripts into one master script & > restructure configs accordingly > -- > > Key: GOBBLIN-707 > URL: https://issues.apache.org/jira/browse/GOBBLIN-707 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Jay Sen >Priority: Major > Time Spent: 5h 40m > Remaining Estimate: 0h > > gobblin supports multiple modes of executions ( CLI, Standalone, > cluster-master, cluster-worker, AWS, YARN, MR ) and various command lines > utility to run cli and admin commands. There is a individual script for each > of them. > Having individual script introduces lot of issues > # all scripts handles gobblin variables, user parameters differently, and > its highly inconsistent among various different gobblin scripts > # functionality around start, stop, status checking and handling PID's among > lot of other things, varies vastly as per the implementation of the script. > # features like GC & JVM params, log4j file selection, classpath > calculation, etc... exists in some gobblin scripts but not all, adding to > inconsistent user experience. > # maintaining total 13 script would be too much effort. > Also all the gobblin scripts share lot of common code to handle params, > start, stop services, status checks, pid handling, etc... combining all the > scripts into 1 not only makes maintenance easier but also brings clarity and > consistency. > > Solution: > 1. there can be one gobblin.sh script to handle all gobblin commands and > deployment options as per following signature. NOTE: This > {{gobblin.sh }} > {{gobblin.sh }} > {{commands values: admin, cli, statestore-check, statestore-clean, > historystore-manager, classpath}} > {{service values: standalone, cluster-master, cluster-worker, aws, yarn, mr, > service}} > with above change, following becomes valid command. > {code:java} > # all under GobblinCli class > gobblin run listQuickApps –> gobblin cli run listQuickApps > gobblin run listQuickApps –> gobblin cli run listQuickApps > gobblin run -> gobblin cli run > # class: JobStateToJsonConverter > statestore-checker.sh -> gobblin statestore-checker > # class: StateStoreCleaner > statestore-clean.sh -> gobblin statestore-clean > # class: DatabaseJobHistoryStoreSchemaManager > historystore-manager.sh -> gobblin historystore-manager > # class: Cli > gobblin-admin.sh-> gobblin admin > # all gobblin deployment modes > gobblin-cluster-master.sh -> gobblin cluster-mater start|stop|status > gobblin-cluster-worker.sh -> gobblin cluster-mater start|stop|status > gobblin-compaction.sh -> gobblin cluster-mater start|stop|status > gobblin-env.sh -> gobblin cluster-mater start|stop|status > gobblin-mapreduce.sh-> gobblin cluster-mater start|stop|status > gobblin-service.sh -> gobblin cluster-mater start|stop|status > gobblin-standalone.sh -> gobblin cluster-mater start|stop|status > gobblin-yarn.sh -> gobblin cluster-mater start|stop|status > {code} > > 2. Also configs needs to be structured and deduped accordingly to make it > clear on which config will be picked up for which execution mode. > > {color:#FF} > NOTE: this refactoring to gobblin.sh,
[jira] [Created] (GOBBLIN-764) Allow passing of rest.li parameters to throttling client
Issac Buenrostro created GOBBLIN-764: Summary: Allow passing of rest.li parameters to throttling client Key: GOBBLIN-764 URL: https://issues.apache.org/jira/browse/GOBBLIN-764 Project: Apache Gobblin Issue Type: Improvement Reporter: Issac Buenrostro Assignee: Issac Buenrostro -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (GOBBLIN-760) Improve retrying behavior of throttling clients
[ https://issues.apache.org/jira/browse/GOBBLIN-760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Issac Buenrostro resolved GOBBLIN-760. -- Resolution: Fixed Fix Version/s: 0.15.0 Issue resolved by pull request #2624 [https://github.com/apache/incubator-gobblin/pull/2624] > Improve retrying behavior of throttling clients > --- > > Key: GOBBLIN-760 > URL: https://issues.apache.org/jira/browse/GOBBLIN-760 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Issac Buenrostro >Assignee: Issac Buenrostro >Priority: Major > Fix For: 0.15.0 > > Time Spent: 1.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GOBBLIN-760) Improve retrying behavior of throttling clients
Issac Buenrostro created GOBBLIN-760: Summary: Improve retrying behavior of throttling clients Key: GOBBLIN-760 URL: https://issues.apache.org/jira/browse/GOBBLIN-760 Project: Apache Gobblin Issue Type: Improvement Reporter: Issac Buenrostro Assignee: Issac Buenrostro -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (GOBBLIN-734) Fix speculative safety checking in HiveWritableHdfsDataWriter
[ https://issues.apache.org/jira/browse/GOBBLIN-734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Issac Buenrostro resolved GOBBLIN-734. -- Resolution: Fixed Fix Version/s: 0.15.0 Issue resolved by pull request #2601 [https://github.com/apache/incubator-gobblin/pull/2601] > Fix speculative safety checking in HiveWritableHdfsDataWriter > - > > Key: GOBBLIN-734 > URL: https://issues.apache.org/jira/browse/GOBBLIN-734 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Lei Sun >Priority: Major > Fix For: 0.15.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (GOBBLIN-752) Throttling server incorrectly marks permit numbers as unsatisfiable
[ https://issues.apache.org/jira/browse/GOBBLIN-752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Issac Buenrostro resolved GOBBLIN-752. -- Resolution: Fixed Fix Version/s: 0.15.0 Issue resolved by pull request #2617 [https://github.com/apache/incubator-gobblin/pull/2617] > Throttling server incorrectly marks permit numbers as unsatisfiable > --- > > Key: GOBBLIN-752 > URL: https://issues.apache.org/jira/browse/GOBBLIN-752 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Issac Buenrostro >Assignee: Issac Buenrostro >Priority: Major > Fix For: 0.15.0 > > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GOBBLIN-752) Throttling server incorrectly marks permit numbers as unsatisfiable
Issac Buenrostro created GOBBLIN-752: Summary: Throttling server incorrectly marks permit numbers as unsatisfiable Key: GOBBLIN-752 URL: https://issues.apache.org/jira/browse/GOBBLIN-752 Project: Apache Gobblin Issue Type: Improvement Reporter: Issac Buenrostro Assignee: Issac Buenrostro -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (GOBBLIN-749) Better access logging for throttling server
[ https://issues.apache.org/jira/browse/GOBBLIN-749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Issac Buenrostro resolved GOBBLIN-749. -- Resolution: Fixed Fix Version/s: 0.15.0 Issue resolved by pull request #2614 [https://github.com/apache/incubator-gobblin/pull/2614] > Better access logging for throttling server > --- > > Key: GOBBLIN-749 > URL: https://issues.apache.org/jira/browse/GOBBLIN-749 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Issac Buenrostro >Assignee: Issac Buenrostro >Priority: Major > Fix For: 0.15.0 > > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GOBBLIN-749) Better access logging for throttling server
Issac Buenrostro created GOBBLIN-749: Summary: Better access logging for throttling server Key: GOBBLIN-749 URL: https://issues.apache.org/jira/browse/GOBBLIN-749 Project: Apache Gobblin Issue Type: Improvement Reporter: Issac Buenrostro Assignee: Issac Buenrostro -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (GOBBLIN-724) Throttling server delays responses for throttling causing too many connections
[ https://issues.apache.org/jira/browse/GOBBLIN-724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Issac Buenrostro resolved GOBBLIN-724. -- Resolution: Fixed Fix Version/s: 0.15.0 Issue resolved by pull request #2591 [https://github.com/apache/incubator-gobblin/pull/2591] > Throttling server delays responses for throttling causing too many connections > -- > > Key: GOBBLIN-724 > URL: https://issues.apache.org/jira/browse/GOBBLIN-724 > Project: Apache Gobblin > Issue Type: Bug >Reporter: Issac Buenrostro >Priority: Major > Fix For: 0.15.0 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > Currently, the throttling server implements throttling in part by delaying > the response with the permit allocation. However, when waiting to respond, > the request remains in flight utilizing system resources and severely > limiting how many clients can use the throttling server. > As a fix, the server should respond immediately and ask the client to wait > before distributing the permits. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GOBBLIN-724) Throttling server delays responses for throttling causing too many connections
Issac Buenrostro created GOBBLIN-724: Summary: Throttling server delays responses for throttling causing too many connections Key: GOBBLIN-724 URL: https://issues.apache.org/jira/browse/GOBBLIN-724 Project: Apache Gobblin Issue Type: Bug Reporter: Issac Buenrostro Currently, the throttling server implements throttling in part by delaying the response with the permit allocation. However, when waiting to respond, the request remains in flight utilizing system resources and severely limiting how many clients can use the throttling server. As a fix, the server should respond immediately and ask the client to wait before distributing the permits. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GOBBLIN-701) Support for secure templates that limit which keys can be overriden
Issac Buenrostro created GOBBLIN-701: Summary: Support for secure templates that limit which keys can be overriden Key: GOBBLIN-701 URL: https://issues.apache.org/jira/browse/GOBBLIN-701 Project: Apache Gobblin Issue Type: Improvement Reporter: Issac Buenrostro -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GOBBLIN-697) Allow distcp to carry over file version independently of modtime
Issac Buenrostro created GOBBLIN-697: Summary: Allow distcp to carry over file version independently of modtime Key: GOBBLIN-697 URL: https://issues.apache.org/jira/browse/GOBBLIN-697 Project: Apache Gobblin Issue Type: Improvement Reporter: Issac Buenrostro Assignee: Issac Buenrostro Examples where this might be useful is data syncing between two locations. Relying on modification times to detect data changes may lead to a feedback loop of copying: data gets created at location A at time 0, at time 1 data is copied to location B, sync mechanism might incorrectly believe that since mod time of location B is higher, it should be synced back to location A, etc. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GOBBLIN-677) Allow for early termination of Gobblin jobs based on a predicate on job progress
Issac Buenrostro created GOBBLIN-677: Summary: Allow for early termination of Gobblin jobs based on a predicate on job progress Key: GOBBLIN-677 URL: https://issues.apache.org/jira/browse/GOBBLIN-677 Project: Apache Gobblin Issue Type: Improvement Reporter: Issac Buenrostro Assignee: Issac Buenrostro -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GOBBLIN-566) HiveMetastoreBasedRegister incorrectly issues an alter_partition when it should do an add_partition
Issac Buenrostro created GOBBLIN-566: Summary: HiveMetastoreBasedRegister incorrectly issues an alter_partition when it should do an add_partition Key: GOBBLIN-566 URL: https://issues.apache.org/jira/browse/GOBBLIN-566 Project: Apache Gobblin Issue Type: Bug Reporter: Issac Buenrostro org.apache.gobblin.hive.metastore.HiveMetaStoreBasedRegister#addPartitionIfNotExists called `alter_partition` if the partition did not exists. This throws a `alter is not possible` error. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GOBBLIN-471) DatasetFinderSource should allow skipping datasets
Issac Buenrostro created GOBBLIN-471: Summary: DatasetFinderSource should allow skipping datasets Key: GOBBLIN-471 URL: https://issues.apache.org/jira/browse/GOBBLIN-471 Project: Apache Gobblin Issue Type: Improvement Reporter: Issac Buenrostro -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (GOBBLIN-460) Gobblin will skip all future tasks if the first n tasks complete before the n+1th is scheduled
[ https://issues.apache.org/jira/browse/GOBBLIN-460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Issac Buenrostro updated GOBBLIN-460: - Description: Issue with `CountUpAndDownLatch` where the latch will complete if at any point all tasks are completed, ignoring any future countUps. > Gobblin will skip all future tasks if the first n tasks complete before the > n+1th is scheduled > -- > > Key: GOBBLIN-460 > URL: https://issues.apache.org/jira/browse/GOBBLIN-460 > Project: Apache Gobblin > Issue Type: Bug >Reporter: Issac Buenrostro >Priority: Major > > Issue with `CountUpAndDownLatch` where the latch will complete if at any > point all tasks are completed, ignoring any future countUps. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GOBBLIN-460) Gobblin will skip all future tasks if the first n tasks complete before the n+1th is scheduled
Issac Buenrostro created GOBBLIN-460: Summary: Gobblin will skip all future tasks if the first n tasks complete before the n+1th is scheduled Key: GOBBLIN-460 URL: https://issues.apache.org/jira/browse/GOBBLIN-460 Project: Apache Gobblin Issue Type: Bug Reporter: Issac Buenrostro -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GOBBLIN-440) SQLServer source uses "source.querybased.schema" as database name
Issac Buenrostro created GOBBLIN-440: Summary: SQLServer source uses "source.querybased.schema" as database name Key: GOBBLIN-440 URL: https://issues.apache.org/jira/browse/GOBBLIN-440 Project: Apache Gobblin Issue Type: Bug Reporter: Issac Buenrostro -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GOBBLIN-400) Allow MR Task to skip MR job execution
Issac Buenrostro created GOBBLIN-400: Summary: Allow MR Task to skip MR job execution Key: GOBBLIN-400 URL: https://issues.apache.org/jira/browse/GOBBLIN-400 Project: Apache Gobblin Issue Type: Improvement Reporter: Issac Buenrostro -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (GOBBLIN-365) Add lookback days config property for CopyableGlobDatasetFinder
[ https://issues.apache.org/jira/browse/GOBBLIN-365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Issac Buenrostro resolved GOBBLIN-365. -- Resolution: Fixed Issue resolved by pull request #2238 [https://github.com/apache/incubator-gobblin/pull/2238] > Add lookback days config property for CopyableGlobDatasetFinder > --- > > Key: GOBBLIN-365 > URL: https://issues.apache.org/jira/browse/GOBBLIN-365 > Project: Apache Gobblin > Issue Type: Improvement > Components: gobblin-core >Affects Versions: 0.11.0, 0.12.0, 0.13.0 >Reporter: Sudarshan Vasudevan >Assignee: Abhishek Tiwari > Fix For: 0.13.0 > > > This feature adds a lookback days config property for > CopyableGlobDatasetFinder to control the number of days to go back for > distcp-ing time based data sets. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (GOBBLIN-341) Fix logger name to correct class prefix after apache package change
[ https://issues.apache.org/jira/browse/GOBBLIN-341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Issac Buenrostro resolved GOBBLIN-341. -- Resolution: Fixed Issue resolved by pull request #2196 [https://github.com/apache/incubator-gobblin/pull/2196] > Fix logger name to correct class prefix after apache package change > --- > > Key: GOBBLIN-341 > URL: https://issues.apache.org/jira/browse/GOBBLIN-341 > Project: Apache Gobblin > Issue Type: Bug >Reporter: Lei Sun >Assignee: Lei Sun > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (GOBBLIN-332) Implement fetching hive tokens in tokenUtils
[ https://issues.apache.org/jira/browse/GOBBLIN-332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Issac Buenrostro resolved GOBBLIN-332. -- Resolution: Fixed Issue resolved by pull request #2184 [https://github.com/apache/incubator-gobblin/pull/2184] > Implement fetching hive tokens in tokenUtils > > > Key: GOBBLIN-332 > URL: https://issues.apache.org/jira/browse/GOBBLIN-332 > Project: Apache Gobblin > Issue Type: Task >Reporter: Lei Sun >Assignee: Lei Sun > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (GOBBLIN-330) Generate Kerberos Principal dynamically
[ https://issues.apache.org/jira/browse/GOBBLIN-330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Issac Buenrostro resolved GOBBLIN-330. -- Resolution: Fixed Issue resolved by pull request #2182 [https://github.com/apache/incubator-gobblin/pull/2182] > Generate Kerberos Principal dynamically > --- > > Key: GOBBLIN-330 > URL: https://issues.apache.org/jira/browse/GOBBLIN-330 > Project: Apache Gobblin > Issue Type: Bug >Reporter: Lei Sun >Assignee: Lei Sun > > In the job type that fetch tokens in jobLauncher itself, instead of setting > kerberos principal (keytab.user) in configuration, but obtain it dynamically. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (GOBBLIN-294) Change logging level of refection utilities
[ https://issues.apache.org/jira/browse/GOBBLIN-294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Issac Buenrostro resolved GOBBLIN-294. -- Resolution: Fixed Issue resolved by pull request #2145 [https://github.com/apache/incubator-gobblin/pull/2145] > Change logging level of refection utilities > --- > > Key: GOBBLIN-294 > URL: https://issues.apache.org/jira/browse/GOBBLIN-294 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Lei Sun >Assignee: Lei Sun > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (GOBBLIN-314) Validate filesize when copying in writer
[ https://issues.apache.org/jira/browse/GOBBLIN-314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Issac Buenrostro resolved GOBBLIN-314. -- Resolution: Fixed Issue resolved by pull request #2168 [https://github.com/apache/incubator-gobblin/pull/2168] > Validate filesize when copying in writer > > > Key: GOBBLIN-314 > URL: https://issues.apache.org/jira/browse/GOBBLIN-314 > Project: Apache Gobblin > Issue Type: Bug >Reporter: Jack Moseley >Assignee: Jack Moseley > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (GOBBLIN-273) Add failure monitoring
[ https://issues.apache.org/jira/browse/GOBBLIN-273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Issac Buenrostro resolved GOBBLIN-273. -- Resolution: Fixed Issue resolved by pull request #2125 [https://github.com/apache/incubator-gobblin/pull/2125] > Add failure monitoring > -- > > Key: GOBBLIN-273 > URL: https://issues.apache.org/jira/browse/GOBBLIN-273 > Project: Apache Gobblin > Issue Type: Task > Components: gobblin-core >Reporter: Zhixiong Chen >Assignee: Zhixiong Chen > > When a job failed with a very long log, it's not easy to dive into the log > and find the reason of the failure. Here a reporter is plugin-ed into the > Gobblin Metrics architecture to collect job failure events into a file. A job > now has task level and dataset level failure events reported for free. > h3. `MetricContext#submitFailureEvent` > When a failure event needs to be reported, it should be submitted with this > method, which encapsulates the event into a `FailureEventNotification` > h3. `FileFailureEventReporter` > Report all failure events into a file. Each job has its own report folder. > h3. Configurations > To enable job failure reporting, the following configurations are required > {code:java} > // Some comments here > metrics.enabled=true > fs.uri= // by default, local file system is used > failure.log.dir= > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (GOBBLIN-287) Support service-level throttling quotas
[ https://issues.apache.org/jira/browse/GOBBLIN-287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Issac Buenrostro resolved GOBBLIN-287. -- Resolution: Fixed Issue resolved by pull request #2142 [https://github.com/apache/incubator-gobblin/pull/2142] > Support service-level throttling quotas > --- > > Key: GOBBLIN-287 > URL: https://issues.apache.org/jira/browse/GOBBLIN-287 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Jack Dintruff > > We would like to throttle ETL HDFS calls independently from other HDFS calls. > This could be implemented by adding the service name in both the config key > and in the resource name. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (GOBBLIN-285) KafkaExtractor does not compute avgMillisPerRecord when partition pull is interrupted
[ https://issues.apache.org/jira/browse/GOBBLIN-285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Issac Buenrostro resolved GOBBLIN-285. -- Resolution: Fixed Issue resolved by pull request #2138 [https://github.com/apache/incubator-gobblin/pull/2138] > KafkaExtractor does not compute avgMillisPerRecord when partition pull is > interrupted > - > > Key: GOBBLIN-285 > URL: https://issues.apache.org/jira/browse/GOBBLIN-285 > Project: Apache Gobblin > Issue Type: Bug >Reporter: Issac Buenrostro > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (GOBBLIN-285) KafkaExtractor does not compute avgMillisPerRecord when partition pull is interrupted
Issac Buenrostro created GOBBLIN-285: Summary: KafkaExtractor does not compute avgMillisPerRecord when partition pull is interrupted Key: GOBBLIN-285 URL: https://issues.apache.org/jira/browse/GOBBLIN-285 Project: Apache Gobblin Issue Type: Bug Reporter: Issac Buenrostro -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (GOBBLIN-282) Support templates on Gobblin Azkaban launcher
[ https://issues.apache.org/jira/browse/GOBBLIN-282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Issac Buenrostro resolved GOBBLIN-282. -- Resolution: Fixed Issue resolved by pull request #2135 [https://github.com/apache/incubator-gobblin/pull/2135] > Support templates on Gobblin Azkaban launcher > - > > Key: GOBBLIN-282 > URL: https://issues.apache.org/jira/browse/GOBBLIN-282 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Issac Buenrostro > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (GOBBLIN-275) Use listStatus instead of globStatus for finding persisted files
[ https://issues.apache.org/jira/browse/GOBBLIN-275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Issac Buenrostro resolved GOBBLIN-275. -- Resolution: Fixed Issue resolved by pull request #2128 [https://github.com/apache/incubator-gobblin/pull/2128] > Use listStatus instead of globStatus for finding persisted files > > > Key: GOBBLIN-275 > URL: https://issues.apache.org/jira/browse/GOBBLIN-275 > Project: Apache Gobblin > Issue Type: Bug >Reporter: Jack Moseley >Assignee: Jack Moseley > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (GOBBLIN-282) Support templates on Gobblin Azkaban launcher
Issac Buenrostro created GOBBLIN-282: Summary: Support templates on Gobblin Azkaban launcher Key: GOBBLIN-282 URL: https://issues.apache.org/jira/browse/GOBBLIN-282 Project: Apache Gobblin Issue Type: Improvement Reporter: Issac Buenrostro -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (GOBBLIN-270) State Migration script
[ https://issues.apache.org/jira/browse/GOBBLIN-270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Issac Buenrostro resolved GOBBLIN-270. -- Resolution: Fixed Issue resolved by pull request #2122 [https://github.com/apache/incubator-gobblin/pull/2122] > State Migration script > -- > > Key: GOBBLIN-270 > URL: https://issues.apache.org/jira/browse/GOBBLIN-270 > Project: Apache Gobblin > Issue Type: Bug >Reporter: Lei Sun >Assignee: Lei Sun > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (GOBBLIN-200) State store dataset cleaner using state store listing API
[ https://issues.apache.org/jira/browse/GOBBLIN-200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Issac Buenrostro resolved GOBBLIN-200. -- Resolution: Fixed Issue resolved by pull request #2097 [https://github.com/apache/incubator-gobblin/pull/2097] > State store dataset cleaner using state store listing API > - > > Key: GOBBLIN-200 > URL: https://issues.apache.org/jira/browse/GOBBLIN-200 > Project: Apache Gobblin > Issue Type: Sub-task > Components: state-management >Reporter: Issac Buenrostro >Assignee: Issac Buenrostro > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (GOBBLIN-266) Improve Hive Task setup
Issac Buenrostro created GOBBLIN-266: Summary: Improve Hive Task setup Key: GOBBLIN-266 URL: https://issues.apache.org/jira/browse/GOBBLIN-266 Project: Apache Gobblin Issue Type: Improvement Reporter: Issac Buenrostro Allow: - Adding jars - Adding files - Setting properties -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (GOBBLIN-253) Hive materializer enhancements
Issac Buenrostro created GOBBLIN-253: Summary: Hive materializer enhancements Key: GOBBLIN-253 URL: https://issues.apache.org/jira/browse/GOBBLIN-253 Project: Apache Gobblin Issue Type: Improvement Reporter: Issac Buenrostro Hive materializer should not assume that the origin tables are Avro, and it should support more than just copying a table. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (GOBBLIN-241) Allow multiple datasets send different lineage event for kafka
[ https://issues.apache.org/jira/browse/GOBBLIN-241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Issac Buenrostro resolved GOBBLIN-241. -- Resolution: Fixed Issue resolved by pull request #2092 [https://github.com/apache/incubator-gobblin/pull/2092] > Allow multiple datasets send different lineage event for kafka > -- > > Key: GOBBLIN-241 > URL: https://issues.apache.org/jira/browse/GOBBLIN-241 > Project: Apache Gobblin > Issue Type: Bug >Reporter: Kuai Yu >Assignee: Kuai Yu > > This task is mainly to add or refactor existing lineage events support. Allow > task level publisher to submit lineage event. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (GOBBLIN-240) Adding three more Azkaban tags
[ https://issues.apache.org/jira/browse/GOBBLIN-240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Issac Buenrostro resolved GOBBLIN-240. -- Resolution: Fixed Issue resolved by pull request #2091 [https://github.com/apache/incubator-gobblin/pull/2091] > Adding three more Azkaban tags > -- > > Key: GOBBLIN-240 > URL: https://issues.apache.org/jira/browse/GOBBLIN-240 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Jin Hyuk Chang > > Adding below three more Azkaban tag as per WIMD request. > - Azkaban Flow Url > - Azkaban Job executin URL > - Azkaban Job Url -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (GOBBLIN-181) Modify Avro2ORC flow to materialize Hive views
[ https://issues.apache.org/jira/browse/GOBBLIN-181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Issac Buenrostro resolved GOBBLIN-181. -- Resolution: Fixed Issue resolved by pull request #2062 [https://github.com/apache/incubator-gobblin/pull/2062] > Modify Avro2ORC flow to materialize Hive views > -- > > Key: GOBBLIN-181 > URL: https://issues.apache.org/jira/browse/GOBBLIN-181 > Project: Apache Gobblin > Issue Type: New Feature >Reporter: Arjun Singh Bora >Assignee: Arjun Singh Bora > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (GOBBLIN-224) Gobblin doesn't support keyring based GPG file decryption
[ https://issues.apache.org/jira/browse/GOBBLIN-224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Issac Buenrostro resolved GOBBLIN-224. -- Resolution: Fixed Issue resolved by pull request #2076 [https://github.com/apache/incubator-gobblin/pull/2076] > Gobblin doesn't support keyring based GPG file decryption > - > > Key: GOBBLIN-224 > URL: https://issues.apache.org/jira/browse/GOBBLIN-224 > Project: Apache Gobblin > Issue Type: New Feature > Components: gobblin-crypto >Reporter: Zixuan Liu >Assignee: Shirshanka Das > > In gobblin crypto, GPGFileDecryptor only support decrypting password based > encryption files. However, we have a keyring based encryption file that needs > to be decrypted -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (GOBBLIN-210) Implement a source based on Dataset Finder
[ https://issues.apache.org/jira/browse/GOBBLIN-210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Issac Buenrostro resolved GOBBLIN-210. -- Resolution: Fixed Issue resolved by pull request #2063 [https://github.com/apache/incubator-gobblin/pull/2063] > Implement a source based on Dataset Finder > -- > > Key: GOBBLIN-210 > URL: https://issues.apache.org/jira/browse/GOBBLIN-210 > Project: Apache Gobblin > Issue Type: Bug > Components: gobblin-core >Reporter: Issac Buenrostro >Assignee: Issac Buenrostro > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (GOBBLIN-182) Emit Lineage Events for Query Based Sources
[ https://issues.apache.org/jira/browse/GOBBLIN-182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Issac Buenrostro resolved GOBBLIN-182. -- Resolution: Fixed Issue resolved by pull request #2034 [https://github.com/apache/incubator-gobblin/pull/2034] > Emit Lineage Events for Query Based Sources > --- > > Key: GOBBLIN-182 > URL: https://issues.apache.org/jira/browse/GOBBLIN-182 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Kuai Yu >Assignee: Kuai Yu > > Emit linage events in QueryBasedSource, FsDataWriter and BasePublisher -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (GOBBLIN-210) Implement a source based on Dataset Finder
Issac Buenrostro created GOBBLIN-210: Summary: Implement a source based on Dataset Finder Key: GOBBLIN-210 URL: https://issues.apache.org/jira/browse/GOBBLIN-210 Project: Apache Gobblin Issue Type: Bug Components: gobblin-core Reporter: Issac Buenrostro Assignee: Issac Buenrostro -- This message was sent by Atlassian JIRA (v6.4.14#64029)