[jira] [Resolved] (GOBBLIN-1195) Close the writer when a fork is done

2020-07-26 Thread Hung Tran (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-1195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-1195.

Resolution: Duplicate

> Close the writer when a fork is done
> 
>
> Key: GOBBLIN-1195
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1195
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Hung Tran
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (GOBBLIN-1207) Clear references to large objects in Fork, FileBasedExtractor, and HiveWritableHdfsDataWriter

2020-06-30 Thread Hung Tran (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-1207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-1207.

Fix Version/s: 0.15.0
   Resolution: Fixed

Issue resolved by pull request #3052
[https://github.com/apache/incubator-gobblin/pull/3052]

> Clear references to large objects in Fork, FileBasedExtractor, and 
> HiveWritableHdfsDataWriter
> -
>
> Key: GOBBLIN-1207
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1207
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Hung Tran
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Fork, FileBasedExtractor, and HiveWritableHdfsDataWriter objects contain 
> references to objects that can be large, such as ORC reader and writer 
> buffers. Clear these references to allow the memory to be reclaimed during 
> the job execution.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (GOBBLIN-1207) Clear references to large objects in Fork, FileBasedExtractor, and HiveWritableHdfsDataWriter

2020-06-29 Thread Hung Tran (Jira)
Hung Tran created GOBBLIN-1207:
--

 Summary: Clear references to large objects in Fork, 
FileBasedExtractor, and HiveWritableHdfsDataWriter
 Key: GOBBLIN-1207
 URL: https://issues.apache.org/jira/browse/GOBBLIN-1207
 Project: Apache Gobblin
  Issue Type: Improvement
Reporter: Hung Tran


Fork, FileBasedExtractor, and HiveWritableHdfsDataWriter objects contain 
references to objects that can be large, such as ORC reader and writer buffers. 
Clear these references to allow the memory to be reclaimed during the job 
execution.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (GOBBLIN-1202) Add retry for REST API call

2020-06-22 Thread Hung Tran (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-1202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-1202.

Fix Version/s: 0.15.0
   Resolution: Fixed

Issue resolved by pull request #3049
[https://github.com/apache/incubator-gobblin/pull/3049]

> Add retry for REST API call
> ---
>
> Key: GOBBLIN-1202
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1202
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Alex Li
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> SFDC objects have index on their column - *SystemModstamp*
> This index could be in disk. When we execute  
> {code:java}
> Select count(systemmodstamp) from table_name group by day_only(systemmodstamp)
> {code}
> If the index is in disk, it needs to load. It would be timeout.
> Retry would result it.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (GOBBLIN-1200) Fix bug when local network throttling distcp jobs

2020-06-22 Thread Hung Tran (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-1200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-1200.

Fix Version/s: 0.15.0
   Resolution: Fixed

Issue resolved by pull request #3047
[https://github.com/apache/incubator-gobblin/pull/3047]

> Fix bug when local network throttling distcp jobs
> -
>
> Key: GOBBLIN-1200
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1200
> Project: Apache Gobblin
>  Issue Type: Bug
>Reporter: Jack Moseley
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (GOBBLIN-1195) Close the writer when a fork is done

2020-06-16 Thread Hung Tran (Jira)
Hung Tran created GOBBLIN-1195:
--

 Summary: Close the writer when a fork is done
 Key: GOBBLIN-1195
 URL: https://issues.apache.org/jira/browse/GOBBLIN-1195
 Project: Apache Gobblin
  Issue Type: Improvement
Reporter: Hung Tran






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (GOBBLIN-1188) Fix log message for SFDC iterators

2020-06-10 Thread Hung Tran (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-1188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-1188.

Fix Version/s: 0.15.0
   Resolution: Fixed

Issue resolved by pull request #3036
[https://github.com/apache/incubator-gobblin/pull/3036]

> Fix log message for SFDC iterators
> --
>
> Key: GOBBLIN-1188
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1188
> Project: Apache Gobblin
>  Issue Type: Bug
>Reporter: Alex Li
>Priority: Minor
> Fix For: 0.15.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We printed out same message twice. 
> remove it from hasNext().



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (GOBBLIN-1186) Fix SFDC source.querybased.salesforce.is.soft.deletes.pull.disabled not available for simple mode

2020-06-09 Thread Hung Tran (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-1186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-1186.

Fix Version/s: 0.15.0
   Resolution: Fixed

Issue resolved by pull request #3034
[https://github.com/apache/incubator-gobblin/pull/3034]

> Fix SFDC source.querybased.salesforce.is.soft.deletes.pull.disabled not 
> available for simple mode
> -
>
> Key: GOBBLIN-1186
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1186
> Project: Apache Gobblin
>  Issue Type: Bug
>Reporter: Alex Li
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> *Problem statement*
> source.querybased.salesforce.is.soft.deletes.pull.disabled
> doesn’t work for simple mode, it works only for dynamic mode.
> the reason is - we explicitly set up the key-value for the dynamic mode
> [https://github.com/hanghangliu/gobblin/blob/9029a89b85ef373f78d603b14d6aaa75998f3356/gobblin-salesforce/src/main/java/org/apache/gobblin/salesforce/SalesforceSource.java#L327]
>  
> *Root cause*
> The extract state is blank(please see code)
> What we set up in job file is not able to see in extractor state.
> [https://github.com/hashdoop/hashdoop-incubator-gobblin/blob/a871e5c5d6f539bcfbcc4e2850685c58dd72dd1a/gobblin-core/src/main/java/org/apache/gobblin/source/extractor/extract/QueryBasedSource.java#L234]
>  
> *Solution:*
> explicitly set up the {{soft.deletes.pull.disabled}} for simple mode, as we 
> did for dynamic mode



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (GOBBLIN-1174) Fail job on FileBasedSource ls invalid source directory

2020-06-04 Thread Hung Tran (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-1174.

Fix Version/s: 0.15.0
   Resolution: Fixed

Issue resolved by pull request #3019
[https://github.com/apache/incubator-gobblin/pull/3019]

> Fail job on FileBasedSource ls invalid source directory
> ---
>
> Key: GOBBLIN-1174
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1174
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Zhixiong Chen
>Assignee: Zhixiong Chen
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (GOBBLIN-1181) Make parquet-proto dependency in gobblin-parquet-apache compileOnly

2020-06-04 Thread Hung Tran (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-1181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-1181.

Fix Version/s: 0.15.0
   Resolution: Fixed

Issue resolved by pull request #3028
[https://github.com/apache/incubator-gobblin/pull/3028]

> Make parquet-proto dependency in gobblin-parquet-apache compileOnly
> ---
>
> Key: GOBBLIN-1181
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1181
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Vikram Bohra
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> 'org.apache.parquet:parquet-protobuf:1.10.1' is not resolvable from 
> MavenCentral because it's missing some of its transitive dependencies. This 
> will cause 'gobblin-apache-parquet' module _also not be resolvable_ from 
> Central (in the future, when it will be published to Central). To fix this, I 
> suggest using 'compileOnly'. This will ensure that 'gobblin-parquet-apache' 
> is cleanly resolvable from Central.
> This change unblocks automated consumption of gobblin external artifact at 
> LinkedIn
> Tested by running the build and inspecting the pom files



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (GOBBLIN-1179) Add a typed config to replace properties

2020-06-03 Thread Hung Tran (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-1179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-1179.

Fix Version/s: 0.15.0
   Resolution: Fixed

Issue resolved by pull request #2910
[https://github.com/apache/incubator-gobblin/pull/2910]

> Add a typed config to replace properties
> 
>
> Key: GOBBLIN-1179
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1179
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Alex Li
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Add a typed config to replace *_properties.get(“ini.file.userName”)_* with 
> *config.userName*
> The gobblin config file is an ini file. Java loads the ini file to a 
> properties instance. The way to use the config information is to get from the 
> properties.
> |workUnitState.getPropAsBoolean(BULK_API_USE_QUERY_ALL)|
> |workUnitState.getPropAsInt(FETCH_RETRY_LIMIT_KEY, DEFAULT_FETCH_RETRY_LIMIT)|
> |Math.max(MIN_SIZE,Math.min(MAX_SIZE, 
> workUnitState.getPropAsInt(PARTITION_SIZE, DEFAULT_SIZE))); 
> // partition size must be >= min and <= max, otherwise use default|
> Problems
>  # No consistent key naming model
>  * A long dot-separated key string is used, easy to run into typos. The 
> config code is pretty verbose: We use *properties.getProp(key, default)*
>  * Key collision if the same type is used in multiple places, e.g 
> kafka.brokers
>  # No ownership management
>  * in gobblin connector package: We have multiple constant static classes. 
> GobblinKeys, QueryBaseKeys, GaapKeys, and SalesforceConnectorKeys.We can even 
> directly read config values by state.getProp(*“my.key”*) without creating any 
> constant key.
>  # No static validation
>  * Required & default value
>  * Type check
>  * Date range
>  * Enum
>  # No dependency check
>  * If users set to *useGaap=true*, there must be *gaap.url* and 
> *gaap.credential*. And this needs to be verified at both runtime and compile 
> time.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (GOBBLIN-1180) Remove dependency on gobblin-parquet

2020-06-03 Thread Hung Tran (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-1180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-1180.

Fix Version/s: 0.15.0
   Resolution: Fixed

Issue resolved by pull request #3026
[https://github.com/apache/incubator-gobblin/pull/3026]

> Remove dependency on gobblin-parquet
> 
>
> Key: GOBBLIN-1180
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1180
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Vikram Bohra
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> gobblin-parquet has dependencies on com.twitter  parquet libraries.
> 'com.twitter:parquet-protobuf:1.5.0' (including higher versions) is not 
> resolvable from MavenCentral because it's missing some of its transitive 
> dependencies. This makes the 'gobblin-parquet' module _also not resolvable_ 
> from MavenCentral (e.g. 'org.apache.gobblin:gobblin-parquet:0.14.0' does not 
> resolve from Central as of today).
> To fix this, I suggest using "gobblin-parquet-apache" (which has org.apache 
> parquet libraries) instead of "gobblin-parquet" 
> This also unblocks automated consumption of gobblin external artifact at 
> LinkedIn
> After this change, gobblin-parquet module can safely be removed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (GOBBLIN-1176) Create gobblin-all module

2020-06-02 Thread Hung Tran (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-1176.

Fix Version/s: 0.15.0
   Resolution: Fixed

Issue resolved by pull request #3021
[https://github.com/apache/incubator-gobblin/pull/3021]

> Create gobblin-all module 
> --
>
> Key: GOBBLIN-1176
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1176
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Vikram Bohra
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (GOBBLIN-1146) Allow configuring autocommit in JDBCWriters

2020-05-12 Thread Hung Tran (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-1146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-1146.

Fix Version/s: 0.15.0
   Resolution: Fixed

Issue resolved by pull request #2984
[https://github.com/apache/incubator-gobblin/pull/2984]

> Allow configuring autocommit in JDBCWriters
> ---
>
> Key: GOBBLIN-1146
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1146
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Zhixiong Chen
>Assignee: Zhixiong Chen
>Priority: Major
> Fix For: 0.15.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (GOBBLIN-1142) Hive Distcp support filter on partitioned or snapshot table

2020-05-07 Thread Hung Tran (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-1142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-1142.

Fix Version/s: 0.15.0
   Resolution: Fixed

Issue resolved by pull request #2979
[https://github.com/apache/incubator-gobblin/pull/2979]

> Hive Distcp support filter on partitioned or snapshot table
> ---
>
> Key: GOBBLIN-1142
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1142
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Zhixiong Chen
>Assignee: Zhixiong Chen
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> The change adds support filtering a specific type of tables, e.g snapshot, 
> partitioned, in `HiveDatasetFinder`



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (GOBBLIN-1132) move logic of requester list validation to RequesterService implementation

2020-04-28 Thread Hung Tran (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-1132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-1132.

Fix Version/s: 0.15.0
   Resolution: Fixed

Issue resolved by pull request #2969
[https://github.com/apache/incubator-gobblin/pull/2969]

> move logic of requester list validation to RequesterService implementation
> --
>
> Key: GOBBLIN-1132
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1132
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Arjun Singh Bora
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (GOBBLIN-1101) Enhance bulk api retry for ExceedQuota

2020-04-07 Thread Hung Tran (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-1101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-1101.

Fix Version/s: 0.15.0
   Resolution: Fixed

Issue resolved by pull request #2942
[https://github.com/apache/incubator-gobblin/pull/2942]

> Enhance bulk api retry for ExceedQuota
> --
>
> Key: GOBBLIN-1101
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1101
> Project: Apache Gobblin
>  Issue Type: Bug
>Reporter: Alex Li
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> 1. ExceedQuota exception
> Below is SFDC doc about ExceedQuota
> {code:java}
> One of the limits customers frequently reach is the concurrent request limit. 
> Once a synchronous Apex request runs longer than 5 seconds, it begins 
> counting against this limit. Each organization is allowed 10 concurrent 
> long-running requests. If the limit is reached, any new synchronous Apex 
> request results in a runtime exception. This behavior occurs until the 
> organization’s requests are below the limit.
> {code}
> If the ExceedQuota exception happens, we should let the thread sleep 5 
> minutes and try again. There should not be a retryLimit for this exception.
> 2. Except stack in log file
> For example we set up retryLimit to 10, we retried 10 times,  and failed; we 
> need to print out exception stack in log file, there are 10 of them in the 
> exception stack.
> SSL Exception(root cause) retry and get > ExceedQuota retry and 
> get >  ExceedQuota a lot > 
> We'd better skip all the retry exception, only keep the root cause exception.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (GOBBLIN-1108) bump up mysql-connector

2020-04-03 Thread Hung Tran (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-1108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-1108.

Fix Version/s: 0.15.0
   Resolution: Fixed

Issue resolved by pull request #2948
[https://github.com/apache/incubator-gobblin/pull/2948]

> bump up mysql-connector
> ---
>
> Key: GOBBLIN-1108
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1108
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Arjun Singh Bora
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (GOBBLIN-1103) Add High level consumer doc to Latest news

2020-04-03 Thread Hung Tran (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-1103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-1103.

Fix Version/s: 0.15.0
   Resolution: Fixed

Issue resolved by pull request #1
[https://github.com/apache/incubator-gobblin-site/pull/1]

> Add High level consumer doc to Latest news
> --
>
> Key: GOBBLIN-1103
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1103
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Vikram Bohra
>Priority: Trivial
> Fix For: 0.15.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (GOBBLIN-1100) Set average fetch time in the KafkaExtractor even when metrics are disabled

2020-03-27 Thread Hung Tran (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-1100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-1100.

Fix Version/s: 0.15.0
   Resolution: Fixed

Issue resolved by pull request #2941
[https://github.com/apache/incubator-gobblin/pull/2941]

> Set average fetch time in the KafkaExtractor even when metrics are disabled
> ---
>
> Key: GOBBLIN-1100
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1100
> Project: Apache Gobblin
>  Issue Type: Bug
>Reporter: Hung Tran
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> The KafkaExtractor close method only calls the 
> KafkaExtractorStatsTracker#emitTrackingEvents method when metrics 
> instrumentation is enabled. This method calls 
> KafkaExtractorStatsTracker#generateTagsForPartitions, which has the side 
> effect of setting the average fetch time in the WorkUnitState. The average 
> fetch time is required for the work unit packing to pack optimally.
>  
> Add a call to KafkaExtractorStatsTracker#generateTagsForPartitions in 
> KafkaExtractor#close when metrics instrumentation is disabled to restore the 
> behavior that existed prior to the KafkaExtractorStatsTracker refactoring.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (GOBBLIN-1100) Set average fetch time in the KafkaExtractor even when metrics are disabled

2020-03-27 Thread Hung Tran (Jira)
Hung Tran created GOBBLIN-1100:
--

 Summary: Set average fetch time in the KafkaExtractor even when 
metrics are disabled
 Key: GOBBLIN-1100
 URL: https://issues.apache.org/jira/browse/GOBBLIN-1100
 Project: Apache Gobblin
  Issue Type: Bug
Reporter: Hung Tran


The KafkaExtractor close method only calls the 
KafkaExtractorStatsTracker#emitTrackingEvents method when metrics 
instrumentation is enabled. This method calls 
KafkaExtractorStatsTracker#generateTagsForPartitions, which has the side effect 
of setting the average fetch time in the WorkUnitState. The average fetch time 
is required for the work unit packing to pack optimally.

 

Add a call to KafkaExtractorStatsTracker#generateTagsForPartitions in 
KafkaExtractor#close when metrics instrumentation is disabled to restore the 
behavior that existed prior to the KafkaExtractorStatsTracker refactoring.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (GOBBLIN-1098) Remove commons-lang and slf4j from the orc-dep fat jar

2020-03-26 Thread Hung Tran (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-1098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-1098.

Fix Version/s: 0.15.0
   Resolution: Fixed

Issue resolved by pull request #2939
[https://github.com/apache/incubator-gobblin/pull/2939]

> Remove commons-lang and slf4j from the orc-dep fat jar
> --
>
> Key: GOBBLIN-1098
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1098
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Hung Tran
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The commons-lang and slf4j are commonly used dependencies and are likely to 
> result in conflicts when included in a fat jar without shading. Remove these 
> dependencies from the orc-dep fat jar to avoid conflicts.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (GOBBLIN-1098) Remove commons-lang and slf4j from the orc-dep fat jar

2020-03-26 Thread Hung Tran (Jira)
Hung Tran created GOBBLIN-1098:
--

 Summary: Remove commons-lang and slf4j from the orc-dep fat jar
 Key: GOBBLIN-1098
 URL: https://issues.apache.org/jira/browse/GOBBLIN-1098
 Project: Apache Gobblin
  Issue Type: Improvement
Reporter: Hung Tran


The commons-lang and slf4j are commonly used dependencies and are likely to 
result in conflicts when included in a fat jar without shading. Remove these 
dependencies from the orc-dep fat jar to avoid conflicts.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (GOBBLIN-1096) Work with DST change in compaction watermark

2020-03-25 Thread Hung Tran (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-1096.

Fix Version/s: 0.15.0
   Resolution: Fixed

> Work with DST change in compaction watermark
> 
>
> Key: GOBBLIN-1096
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1096
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Zhixiong Chen
>Assignee: Zhixiong Chen
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (GOBBLIN-1097) ResultChainingIterator.add should check if the argument iterator is null

2020-03-24 Thread Hung Tran (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-1097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-1097.

Fix Version/s: 0.15.0
   Resolution: Fixed

Issue resolved by pull request #2938
[https://github.com/apache/incubator-gobblin/pull/2938]

> ResultChainingIterator.add should check if the argument iterator is null
> 
>
> Key: GOBBLIN-1097
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1097
> Project: Apache Gobblin
>  Issue Type: Bug
>Reporter: Alex Li
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> ResultChainingIterator.add should check if the argument iterator is null.
> It fails, if the argument is null.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (GOBBLIN-1079) use config extract.is.full

2020-03-11 Thread Hung Tran (Jira)


[ 
https://issues.apache.org/jira/browse/GOBBLIN-1079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17057375#comment-17057375
 ] 

Hung Tran edited comment on GOBBLIN-1079 at 3/11/20, 8:55 PM:
--

PR 2918 merged


was (Author: hutran):
PR merged

> use config extract.is.full
> --
>
> Key: GOBBLIN-1079
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1079
> Project: Apache Gobblin
>  Issue Type: Bug
>Reporter: Arjun Singh Bora
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (GOBBLIN-1079) use config extract.is.full

2020-03-11 Thread Hung Tran (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-1079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-1079.

Fix Version/s: 0.15.0
   Resolution: Fixed

PR merged

> use config extract.is.full
> --
>
> Key: GOBBLIN-1079
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1079
> Project: Apache Gobblin
>  Issue Type: Bug
>Reporter: Arjun Singh Bora
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (GOBBLIN-1065) Fix Merge Script failure for SSL in MacOS

2020-03-04 Thread Hung Tran (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-1065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-1065.

Fix Version/s: 0.15.0
   Resolution: Fixed

Issue resolved by pull request #2903
[https://github.com/apache/incubator-gobblin/pull/2903]

> Fix Merge Script failure for SSL in MacOS
> -
>
> Key: GOBBLIN-1065
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1065
> Project: Apache Gobblin
>  Issue Type: Bug
>Reporter: Lei Sun
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (GOBBLIN-1066) field projection with namespace

2020-03-02 Thread Hung Tran (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-1066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-1066.

Fix Version/s: 0.15.0
   Resolution: Fixed

Issue resolved by pull request #2904
[https://github.com/apache/incubator-gobblin/pull/2904]

> field projection with namespace
> ---
>
> Key: GOBBLIN-1066
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1066
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Zhixiong Chen
>Assignee: Zhixiong Chen
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> `AvroProjectionConverter` currently ignores extract namespace to identify 
> fields to remove for a table. The change is to identify fields to remove with 
> namespace into account, configurable. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (GOBBLIN-1057) Remove unnecessary RPCs in distcp-ng

2020-02-24 Thread Hung Tran (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-1057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-1057.

Fix Version/s: 0.15.0
   Resolution: Fixed

Issue resolved by pull request #2897
[https://github.com/apache/incubator-gobblin/pull/2897]

> Remove unnecessary RPCs in distcp-ng
> 
>
> Key: GOBBLIN-1057
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1057
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Hung Tran
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> There are some per-file FileSystem RPCs being invoked in Gobblin distcp-ng.
> This results in a long file discovery phase that can be hours for a few 
> thousand files.
> The RPCs that can be removed are:
> getFileChecksum() - the value doesn't appear to be used.
> getFileStatus() - this is called to get the modification time in 
> ModTimeDataFileVersionStrategy.getVersion(). The modification time is already 
> available from listStatus(), so use that value.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (GOBBLIN-1057) Remove unnecessary RPCs in distcp-ng

2020-02-24 Thread Hung Tran (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-1057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran updated GOBBLIN-1057:
---
Summary: Remove unnecessary RPCs in distcp-ng  (was: Optimize unnecessary 
RPCs in distcp-ng)

> Remove unnecessary RPCs in distcp-ng
> 
>
> Key: GOBBLIN-1057
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1057
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Hung Tran
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> There are some per-file FileSystem RPCs being invoked in Gobblin distcp-ng.
> This results in a long file discovery phase that can be hours for a few 
> thousand files.
> The RPCs that can be removed are:
> getFileChecksum() - the value doesn't appear to be used.
> getFileStatus() - this is called to get the modification time in 
> ModTimeDataFileVersionStrategy.getVersion(). The modification time is already 
> available from listStatus(), so use that value.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (GOBBLIN-1057) Optimize unnecessary RPCs in distcp-ng

2020-02-24 Thread Hung Tran (Jira)
Hung Tran created GOBBLIN-1057:
--

 Summary: Optimize unnecessary RPCs in distcp-ng
 Key: GOBBLIN-1057
 URL: https://issues.apache.org/jira/browse/GOBBLIN-1057
 Project: Apache Gobblin
  Issue Type: Improvement
Reporter: Hung Tran


There are some per-file FileSystem RPCs being invoked in Gobblin distcp-ng.

This results in a long file discovery phase that can be hours for a few 
thousand files.

The RPCs that can be removed are:

getFileChecksum() - the value doesn't appear to be used.

getFileStatus() - this is called to get the modification time in 
ModTimeDataFileVersionStrategy.getVersion(). The modification time is already 
available from listStatus(), so use that value.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (GOBBLIN-1056) Allow customizing client pool population in KafkaSource

2020-02-21 Thread Hung Tran (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-1056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-1056.

Fix Version/s: 0.15.0
   Resolution: Fixed

Issue resolved by pull request #2896
[https://github.com/apache/incubator-gobblin/pull/2896]

> Allow customizing client pool population in KafkaSource
> ---
>
> Key: GOBBLIN-1056
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1056
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Zhixiong Chen
>Assignee: Zhixiong Chen
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Put existing logic of consumer client pool population into method 
> `populateClientPool`, it allows the client created for the pool to carry 
> additional information from the client created to fetch topics



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (GOBBLIN-1045) Emit more events in compaction job

2020-02-18 Thread Hung Tran (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-1045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-1045.

Fix Version/s: 0.15.0
   Resolution: Fixed

Issue resolved by pull request #2885
[https://github.com/apache/incubator-gobblin/pull/2885]

> Emit more events in compaction job
> --
>
> Key: GOBBLIN-1045
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1045
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Zhixiong Chen
>Assignee: Zhixiong Chen
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Emit count event for the following item in compaction job
> - number of files, corresponding to hive metadata "numFiles"
> - record count, corresponding to hive metadata "numRows"
> - bytes written, corresponding to hive metadata "totalSize"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (GOBBLIN-1049) Move workunit commit logic to the end of publish().

2020-02-13 Thread Hung Tran (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-1049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-1049.

Fix Version/s: 0.15.0
   Resolution: Fixed

Issue resolved by pull request #2889
[https://github.com/apache/incubator-gobblin/pull/2889]

> Move workunit commit logic to the end of publish().
> ---
>
> Key: GOBBLIN-1049
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1049
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Kuai Yu
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> We should not blindly commit workunit in the BaseDataPublisher.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (GOBBLIN-1046) Make ORC-Conversion output subdirectory configurable

2020-02-12 Thread Hung Tran (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-1046.

Fix Version/s: 0.15.0
   Resolution: Fixed

Issue resolved by pull request #2886
[https://github.com/apache/incubator-gobblin/pull/2886]

> Make ORC-Conversion output subdirectory configurable
> 
>
> Key: GOBBLIN-1046
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1046
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Lei Sun
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (GOBBLIN-1025) Add retry for PK-Chunking iterator

2020-01-29 Thread Hung Tran (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-1025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-1025.

Fix Version/s: 0.15.0
   Resolution: Fixed

Issue resolved by pull request #2868
[https://github.com/apache/incubator-gobblin/pull/2868]

> Add retry for PK-Chunking iterator
> --
>
> Key: GOBBLIN-1025
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1025
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Alex Li
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> In SFDC connector, there is a class called `ResultIterator` (I will change 
> the name to SalesforceRecordIterator).
> It was using by only PK-Chunking currently. It encapsulated fetching a list 
> of result files to a record iterator.
> However, the csvReader.nextRecord() may throw out network IO exception. We 
> should do retry in this case.
> When a result file is fetched partly and one network IO exception happens, we 
> are in a special situation - first half of the file is already fetched to our 
> local, but another half of the file is still on datasource. 
> We need to
> 1. reopen the file stream
> 2. skip all the records that we already fetched, seek the cursor to the 
> record which we haven't fetched yet.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (GOBBLIN-1008) Upgrade Parquet dependency from twitter artifact to org.apache artifact

2019-12-24 Thread Hung Tran (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-1008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-1008.

Fix Version/s: 0.15.0
   Resolution: Fixed

Issue resolved by pull request #2853
[https://github.com/apache/incubator-gobblin/pull/2853]

> Upgrade Parquet dependency from twitter artifact to org.apache artifact
> ---
>
> Key: GOBBLIN-1008
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1008
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Shirshanka Das
>Assignee: Shirshanka Das
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> The Parquet dependency in Gobblin is pulling in an old version from twitter 
> (2014)
> Filing this Jira to upgrade the dependency to an org.apache Parquet version. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (GOBBLIN-1011) Adjust compaction flow to work with virtual partition

2019-12-23 Thread Hung Tran (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-1011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-1011.

Fix Version/s: 0.15.0
   Resolution: Fixed

Issue resolved by pull request #2856
[https://github.com/apache/incubator-gobblin/pull/2856]

> Adjust compaction flow to work with virtual partition
> -
>
> Key: GOBBLIN-1011
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1011
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Zhixiong Chen
>Assignee: Zhixiong Chen
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> - Update existing `CompactionVerifier`s and `CompactionCompleteAction`s to 
> work with virtual simple file system dataset proply
> - Improve ser/de of `FileSystemDataset` in `CompactionSuiteBase`
> - Update gobblin-hive-registration to work with table parameters properly



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (GOBBLIN-1001) Implement TimePartitionGlobFinder

2019-12-20 Thread Hung Tran (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-1001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-1001.

Fix Version/s: 0.15.0
   Resolution: Fixed

Issue resolved by pull request #2846
[https://github.com/apache/incubator-gobblin/pull/2846]

> Implement TimePartitionGlobFinder
> -
>
> Key: GOBBLIN-1001
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1001
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Zhixiong Chen
>Assignee: Zhixiong Chen
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (GOBBLIN-988) Create Local FS Job Status Retriever

2019-12-20 Thread Hung Tran (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-988.
---
Fix Version/s: 0.15.0
   Resolution: Fixed

Issue resolved by pull request #2834
[https://github.com/apache/incubator-gobblin/pull/2834]

> Create Local FS Job Status Retriever
> 
>
> Key: GOBBLIN-988
> URL: https://issues.apache.org/jira/browse/GOBBLIN-988
> Project: Apache Gobblin
>  Issue Type: New Feature
>Reporter: William Lo
>Priority: Minor
> Fix For: 0.15.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> GaaS currently only supports Kafka to monitor job status in the dagManager.
> Create a simple jobstatusmonitor that can track completed jobs in gobblin 
> standalone.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (GOBBLIN-865) Add feature that enables PK-chunking in partition

2019-12-11 Thread Hung Tran (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-865.
---
Fix Version/s: 0.15.0
   Resolution: Fixed

Issue resolved by pull request #2722
[https://github.com/apache/incubator-gobblin/pull/2722]

> Add feature that enables PK-chunking in partition 
> --
>
> Key: GOBBLIN-865
> URL: https://issues.apache.org/jira/browse/GOBBLIN-865
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Alex Li
>Priority: Major
>  Labels: salesforce
> Fix For: 0.15.0
>
>  Time Spent: 12h 20m
>  Remaining Estimate: 0h
>
> In SFDC(salesforce) connector, we have partitioning mechanisms to split a 
> giant query to multiple sub queries. There are 3 mechanisms:
>  * simple partition (equally split by time)
>  * dynamic pre-partition (generate histogram and split by row numbers)
>  * user specified partition (set up time range in job file)
> However there are tables like Task and Contract are failing time to time to 
> fetch full data.
> We may want to utilize PK-chunking to partition the query.
>  
> The pk-chunking doc from SFDC - 
> [https://developer.salesforce.com/docs/atlas.en-us.api_asynch.meta/api_asynch/async_api_headers_enable_pk_chunking.htm]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (GOBBLIN-995) Add function to instantiate the BulkConnection in SFDC connector

2019-12-05 Thread Hung Tran (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-995.
---
Fix Version/s: 0.15.0
   Resolution: Fixed

Issue resolved by pull request #2840
[https://github.com/apache/incubator-gobblin/pull/2840]

> Add function to instantiate the BulkConnection in SFDC connector
> 
>
> Key: GOBBLIN-995
> URL: https://issues.apache.org/jira/browse/GOBBLIN-995
> Project: Apache Gobblin
>  Issue Type: New Feature
>Reporter: Alex Li
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In SalesforceExtractor class, we instantiated BulkConnection directly.
> {code:java}
>   this.bulkConnection = new BulkConnection(config);
> {code}
> This code makes it is impossible to inject a customized BulkConnection.
> In contrast, httpClient was instantiated in a function. We could extend the 
> class and override the function to return a customized httpClient 
> (GaapHttPClient in our case)
> We should add a function to instantiate the BulkConnection.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (GOBBLIN-987) JsonRecordAvroSchemaToAvroConverter does not reject unrecognized Enum symbols

2019-12-02 Thread Hung Tran (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-987.
---
Fix Version/s: 0.15.0
   Resolution: Fixed

Issue resolved by pull request #2833
[https://github.com/apache/incubator-gobblin/pull/2833]

> JsonRecordAvroSchemaToAvroConverter does not reject unrecognized Enum symbols
> -
>
> Key: GOBBLIN-987
> URL: https://issues.apache.org/jira/browse/GOBBLIN-987
> Project: Apache Gobblin
>  Issue Type: Bug
>Reporter: Ahmed Abdul Hamid
>Priority: Major
> Fix For: 0.15.0
>
> Attachments: invalid-enum-unit-test.patch
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Here is a failing unit test that demonstrates the issue we encounter when we 
> attempt to encode the records produced by 
> {{JsonRecordAvroSchemaToAvroConverter}} with unrecognized Enum symbols: 
> [^invalid-enum-unit-test.patch]
> Here's how to apply the patch and run the test:
> {code:bash}
> $ git apply invalid-enum-unit-test.patch$ 
> $ ./gradlew :gobblin-core:test --tests *testEnumConversion*  {code}
> Here's the output:
> {code:java}
> java.lang.NullPointerException: null of string in field fieldToIgnore of 
> org.apache.gobblin.test.TestRecord
>   at 
> org.apache.avro.generic.GenericDatumWriter.npe(GenericDatumWriter.java:132)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:126)
>   at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:73)
>   at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:60)
>   at 
> org.apache.gobblin.converter.avro.JsonRecordAvroSchemaToAvroConverterTest.testEnumConversion(JsonRecordAvroSchemaToAvroConverterTest.java:82)
>  {code}
> The root cause of the issue is that {{JsonRecordAvroSchemaToAvroConverter}} 
> allows unrecognized Enum symbols not declared in the specified Avro schema.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (GOBBLIN-962) Refactor RecursiveCopyableDataset so that the copy entities generation logic can be reused.

2019-11-20 Thread Hung Tran (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-962.
---
Fix Version/s: 0.15.0
   Resolution: Fixed

Issue resolved by pull request #2811
[https://github.com/apache/incubator-gobblin/pull/2811]

> Refactor RecursiveCopyableDataset so that the copy entities generation logic 
> can be reused.
> ---
>
> Key: GOBBLIN-962
> URL: https://issues.apache.org/jira/browse/GOBBLIN-962
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Kuai Yu
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Refactor RecursiveCopyableDataset so that the copy entities generation logic 
> can be reused.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (GOBBLIN-915) Extract cannot parse the timezone

2019-10-17 Thread Hung Tran (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-915.
---
Fix Version/s: 0.15.0
   Resolution: Fixed

Issue resolved by pull request #2768
[https://github.com/apache/incubator-gobblin/pull/2768]

> Extract cannot parse the timezone
> -
>
> Key: GOBBLIN-915
> URL: https://issues.apache.org/jira/browse/GOBBLIN-915
> Project: Apache Gobblin
>  Issue Type: Bug
>Reporter: Kuai Yu
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (GOBBLIN-908) Customize progress report to enable accurate speculative execution

2019-10-15 Thread Hung Tran (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-908.
---
Fix Version/s: 0.15.0
   Resolution: Fixed

Issue resolved by pull request #2762
[https://github.com/apache/incubator-gobblin/pull/2762]

> Customize progress report to enable accurate speculative execution
> --
>
> Key: GOBBLIN-908
> URL: https://issues.apache.org/jira/browse/GOBBLIN-908
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Lei Sun
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (GOBBLIN-896) Clone schema or field props in AvroFieldRemover

2019-10-10 Thread Hung Tran (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-896.
---
Fix Version/s: 0.15.0
   Resolution: Fixed

Issue resolved by pull request #2750
[https://github.com/apache/incubator-gobblin/pull/2750]

> Clone schema or field props in AvroFieldRemover
> ---
>
> Key: GOBBLIN-896
> URL: https://issues.apache.org/jira/browse/GOBBLIN-896
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Zhixiong Chen
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Currently, `AvroFieldRemover` ignores schema and field level properties while 
> cloning the schema and its fields. The change is to fix it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (GOBBLIN-891) Gobblin Couchbase writer docs are not showing up on readthedocs

2019-10-01 Thread Hung Tran (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-891.
---
Fix Version/s: 0.15.0
   Resolution: Fixed

Issue resolved by pull request #2746
[https://github.com/apache/incubator-gobblin/pull/2746]

> Gobblin Couchbase writer docs are not showing up on readthedocs
> ---
>
> Key: GOBBLIN-891
> URL: https://issues.apache.org/jira/browse/GOBBLIN-891
> Project: Apache Gobblin
>  Issue Type: Bug
>Reporter: Shirshanka Das
>Priority: Minor
> Fix For: 0.15.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Couchbase writer docs were committed as part of GOBBLIN-880. 
> However, they are not showing up the readthedocs page because they were not 
> added to the main index. 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (GOBBLIN-890) Make ExtractID timezone configurable

2019-10-01 Thread Hung Tran (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-890.
---
Fix Version/s: 0.15.0
   Resolution: Fixed

Issue resolved by pull request #2745
[https://github.com/apache/incubator-gobblin/pull/2745]

> Make ExtractID timezone configurable
> 
>
> Key: GOBBLIN-890
> URL: https://issues.apache.org/jira/browse/GOBBLIN-890
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Lei Sun
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (GOBBLIN-885) Fix ORC-Compaction bug in type-casting

2019-09-23 Thread Hung Tran (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-885.
---
Fix Version/s: 0.15.0
   Resolution: Fixed

Issue resolved by pull request #2738
[https://github.com/apache/incubator-gobblin/pull/2738]

> Fix ORC-Compaction bug in type-casting
> --
>
> Key: GOBBLIN-885
> URL: https://issues.apache.org/jira/browse/GOBBLIN-885
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Lei Sun
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (GOBBLIN-884) Support schema evolution for ORC across multiple mappers in MR mode

2019-09-19 Thread Hung Tran (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-884.
---
Fix Version/s: 0.15.0
   Resolution: Fixed

Issue resolved by pull request #2737
[https://github.com/apache/incubator-gobblin/pull/2737]

> Support schema evolution for ORC across multiple mappers in MR mode
> ---
>
> Key: GOBBLIN-884
> URL: https://issues.apache.org/jira/browse/GOBBLIN-884
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Lei Sun
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (GOBBLIN-880) Bump CouchbaseWriter Couchbase SDK version + write docs + cert based auth + enable TTL + dnsSrv

2019-09-16 Thread Hung Tran (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-880.
---
Fix Version/s: 0.15.0
   Resolution: Fixed

Issue resolved by pull request #2734
[https://github.com/apache/incubator-gobblin/pull/2734]

> Bump CouchbaseWriter Couchbase SDK version + write docs + cert based auth + 
> enable TTL + dnsSrv
> ---
>
> Key: GOBBLIN-880
> URL: https://issues.apache.org/jira/browse/GOBBLIN-880
> Project: Apache Gobblin
>  Issue Type: Improvement
>  Components: gobblin-couchbase
>Reporter: Michael A Menarguez
>Assignee: Shirshanka Das
>Priority: Major
>  Labels: Couchbase
> Fix For: 0.15.0
>
>   Original Estimate: 168h
>  Time Spent: 40m
>  Remaining Estimate: 167h 20m
>
> h1. h1. CURRENT ISSUES
> Currently CouchbaseWriter.java lacks the ability to do the following:
>  # Use certificate based authentication
>  # Set document expiry (TTL)
>  ** based on write time
>  ** based on an offset specified field contained in the record's data (JSON)
>  ** (WILL NOT ADRESS) set expiry based on a field contained in the record's 
> data
>  # Set DNS SRV for bootstrap host discovery setting
>  # Missing documentation on CouchbaseWriter usage
>  # Testing does not bring in CouchbaseMock correctly and causes problems 
> while bumping com.couchbase.client:java-client
> h1. h1. PROPOSED SOLUTIONS
>  # Add logic to connect using certificate based auth to the buckets (Will 
> need to bump  com.couchbase.client:java-client to a newer version like 2.7.6) 
> and associated configs
>  # TTL implementation
>  ## Add configs to allow setting a TTL (documentTTL) and also specify the 
> timeunits (documentTTLUnits) of these settings
>  ## Add logic to specify the path to key to the field containing the source 
> timestamp (documentTTLOriginField) and its units (documentTTLOriginUnits) to 
> disambiguate between UNIX (sec) timestamps and other formats like timestamps 
> in milliseconds.
>  ## N/A but logic would be similar to (2)
>  # Add missing dnsSrv config
>  # Write proper documentation
>  # Bring in CouchbaseMock from Gradle and adapt existing unit tests.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Resolved] (GOBBLIN-766) Emit Workunits created event in Apache gobblin

2019-09-11 Thread Hung Tran (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-766.
---
Resolution: Fixed

Issue resolved by pull request #2706
[https://github.com/apache/incubator-gobblin/pull/2706]

> Emit  Workunits created  event  in Apache gobblin
> -
>
> Key: GOBBLIN-766
> URL: https://issues.apache.org/jira/browse/GOBBLIN-766
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: kraman
>Priority: Minor
> Fix For: 0.15.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Emit a new workunits created metric to be captured for monitoring/Alerting



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Resolved] (GOBBLIN-862) Security token encryption support in SFDC connector

2019-08-21 Thread Hung Tran (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-862.
---
Fix Version/s: 0.15.0
   Resolution: Fixed

Issue resolved by pull request #2718
[https://github.com/apache/incubator-gobblin/pull/2718]

> Security token encryption support in SFDC connector
> ---
>
> Key: GOBBLIN-862
> URL: https://issues.apache.org/jira/browse/GOBBLIN-862
> Project: Apache Gobblin
>  Issue Type: Task
>  Components: gobblin-salesforce
>Reporter: Monish Vachhani
>Assignee: Hung Tran
>Priority: Major
> Fix For: 0.15.0
>
>
> Security token encryption support in SFDC connector so as not to have 
> security token as plain text.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Resolved] (GOBBLIN-851) Provide capability to disable hive schema registration in partition level

2019-08-16 Thread Hung Tran (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-851.
---
   Resolution: Fixed
Fix Version/s: 0.15.0

Issue resolved by pull request #2707
[https://github.com/apache/incubator-gobblin/pull/2707]

> Provide capability to disable hive schema registration in partition level
> -
>
> Key: GOBBLIN-851
> URL: https://issues.apache.org/jira/browse/GOBBLIN-851
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Kuai Yu
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> We had problems when table level schema and partition level schema diverges. 
> Think about the case when user register two partitions : 2019/08/10, 
> 2019/08/11, but schema changes in between(S1->S2). Now the table level has 
> schema S2, but 2019/08/10 will have schema S1. 
> Query on the latest schema will cause the old partition failure.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Resolved] (GOBBLIN-857) Extending getTopicsFromConfigStore to accept topicName directly

2019-08-16 Thread Hung Tran (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-857.
---
   Resolution: Fixed
Fix Version/s: 0.15.0

Issue resolved by pull request #2713
[https://github.com/apache/incubator-gobblin/pull/2713]

> Extending getTopicsFromConfigStore to accept topicName directly
> ---
>
> Key: GOBBLIN-857
> URL: https://issues.apache.org/jira/browse/GOBBLIN-857
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Lei Sun
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Resolved] (GOBBLIN-828) Make dynamic config override job config

2019-07-18 Thread Hung Tran (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-828.
---
   Resolution: Fixed
Fix Version/s: 0.15.0

Issue resolved by pull request #2689
[https://github.com/apache/incubator-gobblin/pull/2689]

> Make dynamic config override job config
> ---
>
> Key: GOBBLIN-828
> URL: https://issues.apache.org/jira/browse/GOBBLIN-828
> Project: Apache Gobblin
>  Issue Type: Bug
>Reporter: Jack Moseley
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Resolved] (GOBBLIN-827) Add more events

2019-07-18 Thread Hung Tran (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-827.
---
   Resolution: Fixed
Fix Version/s: 0.15.0

Issue resolved by pull request #2688
[https://github.com/apache/incubator-gobblin/pull/2688]

> Add more events
> ---
>
> Key: GOBBLIN-827
> URL: https://issues.apache.org/jira/browse/GOBBLIN-827
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Zhixiong Chen
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Add the following events
> - `JobStateEventBuilder` to report gobblin job state or MR job state
> - `EntityMissingEventBuilder` to report a missing instance of a certain entity



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Resolved] (GOBBLIN-829) Fix codecov

2019-07-17 Thread Hung Tran (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-829.
---
   Resolution: Fixed
Fix Version/s: 0.15.0

Issue resolved by pull request #2690
[https://github.com/apache/incubator-gobblin/pull/2690]

> Fix codecov
> ---
>
> Key: GOBBLIN-829
> URL: https://issues.apache.org/jira/browse/GOBBLIN-829
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Lei Sun
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Resolved] (GOBBLIN-796) Add support partial updates for flowConfig

2019-07-15 Thread Hung Tran (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-796.
---
   Resolution: Fixed
Fix Version/s: 0.15.0

Issue resolved by pull request #2663
[https://github.com/apache/incubator-gobblin/pull/2663]

> Add support partial updates for flowConfig
> --
>
> Key: GOBBLIN-796
> URL: https://issues.apache.org/jira/browse/GOBBLIN-796
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Jack Moseley
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Resolved] (GOBBLIN-825) Cache record schema in Plain Object reporters rather than create a new schema each time

2019-07-15 Thread Hung Tran (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-825.
---
   Resolution: Fixed
Fix Version/s: 0.15.0

Issue resolved by pull request #2686
[https://github.com/apache/incubator-gobblin/pull/2686]

> Cache record schema in Plain Object reporters rather than create a new schema 
> each time
> ---
>
> Key: GOBBLIN-825
> URL: https://issues.apache.org/jira/browse/GOBBLIN-825
> Project: Apache Gobblin
>  Issue Type: Bug
>Reporter: Vikram Bohra
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Rather than create a new instance of the same schema each time, it is better 
> to create once and re-use. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Resolved] (GOBBLIN-810) Include flow edge ID in job name

2019-07-15 Thread Hung Tran (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-810.
---
   Resolution: Fixed
Fix Version/s: 0.15.0

Issue resolved by pull request #2675
[https://github.com/apache/incubator-gobblin/pull/2675]

> Include flow edge ID in job name
> 
>
> Key: GOBBLIN-810
> URL: https://issues.apache.org/jira/browse/GOBBLIN-810
> Project: Apache Gobblin
>  Issue Type: Bug
>Reporter: Jack Moseley
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Resolved] (GOBBLIN-821) Create Code Coverage Report for Gobblin

2019-07-14 Thread Hung Tran (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-821.
---
   Resolution: Fixed
Fix Version/s: 0.15.0

Issue resolved by pull request #2684
[https://github.com/apache/incubator-gobblin/pull/2684]

> Create Code Coverage Report for Gobblin
> ---
>
> Key: GOBBLIN-821
> URL: https://issues.apache.org/jira/browse/GOBBLIN-821
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Lei Sun
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Resolved] (GOBBLIN-820) Support keyed writer for Kafka

2019-07-09 Thread Hung Tran (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-820.
---
   Resolution: Fixed
Fix Version/s: 0.15.0

Issue resolved by pull request #2682
[https://github.com/apache/incubator-gobblin/pull/2682]

> Support keyed writer for Kafka
> --
>
> Key: GOBBLIN-820
> URL: https://issues.apache.org/jira/browse/GOBBLIN-820
> Project: Apache Gobblin
>  Issue Type: Improvement
>  Components: gobblin-kafka
>Reporter: Shirshanka Das
>Assignee: Shirshanka Das
>Priority: Major
>  Labels: kafka
> Fix For: 0.15.0
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> The current Kafka writer uses the non-keyed API to produce to Kafka. 
> This issue proposes to add support for keyed writes to Kafka. 
> Constraints:
>  * Minimal changes needed to existing pipeline configuration to add support 
> for keyed writes to Kafka
>  * Minimal changes to gobblin-core. Do not add general support for 
> keyed-writers as part of this issue. 
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (GOBBLIN-813) Make SFDC connector support encrypted Salesforce client id and client secret

2019-06-25 Thread Hung Tran (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-813.
---
   Resolution: Fixed
Fix Version/s: 0.15.0

Issue resolved by pull request #2677
[https://github.com/apache/incubator-gobblin/pull/2677]

> Make SFDC connector support encrypted Salesforce client id and client secret
> 
>
> Key: GOBBLIN-813
> URL: https://issues.apache.org/jira/browse/GOBBLIN-813
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Kuai Yu
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (GOBBLIN-799) Bugs in AvroSchemaCheckDefaultStrategy that not return after check ENUM and FIXED

2019-06-17 Thread Hung Tran (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-799.
---
   Resolution: Fixed
Fix Version/s: 0.15.0

Issue resolved by pull request #2666
[https://github.com/apache/incubator-gobblin/pull/2666]

> Bugs in  AvroSchemaCheckDefaultStrategy that not return after check ENUM and 
> FIXED
> --
>
> Key: GOBBLIN-799
> URL: https://issues.apache.org/jira/browse/GOBBLIN-799
> Project: Apache Gobblin
>  Issue Type: Bug
>Reporter: Zihan Li
>Priority: Minor
> Fix For: 0.15.0
>
>
> There are bugs in  AvroSchemaCheckDefaultStrategy that not return after check 
> ENUM and FIXED, just need to add return statement



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (GOBBLIN-800) Remove the metric context cache from GobblinMetricsRegistry

2019-06-14 Thread Hung Tran (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-800.
---
   Resolution: Fixed
Fix Version/s: 0.15.0

Issue resolved by pull request #2667
[https://github.com/apache/incubator-gobblin/pull/2667]

> Remove the metric context cache from GobblinMetricsRegistry
> ---
>
> Key: GOBBLIN-800
> URL: https://issues.apache.org/jira/browse/GOBBLIN-800
> Project: Apache Gobblin
>  Issue Type: Bug
>Reporter: Kuai Yu
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Remove the metric context cache from GobblinMetricsRegistry



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (GOBBLIN-798) Clean up workflows from Helix when the Gobblin application master starts

2019-06-10 Thread Hung Tran (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-798.
---
   Resolution: Fixed
Fix Version/s: 0.15.0

Issue resolved by pull request #2665
[https://github.com/apache/incubator-gobblin/pull/2665]

> Clean up workflows from Helix when the Gobblin application master starts
> 
>
> Key: GOBBLIN-798
> URL: https://issues.apache.org/jira/browse/GOBBLIN-798
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Hung Tran
>Assignee: Hung Tran
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> If the application master aborts a new one may be spawned by YARN. The second 
> application master will resubmit the jobs. This results in duplicate jobs in 
> Helix and multiple instances of the job may run, resulting in duplicate data.
> The Gobblin application master should clean up all workflows on startup to 
> avoid executing multiple instances of a job.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (GOBBLIN-798) Clean up workflows from Helix when the Gobblin application master starts

2019-06-06 Thread Hung Tran (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran updated GOBBLIN-798:
--
Summary: Clean up workflows from Helix when the Gobblin application master 
starts  (was: Cleanup workflows from Helix when the Gobblin application master 
starts)

> Clean up workflows from Helix when the Gobblin application master starts
> 
>
> Key: GOBBLIN-798
> URL: https://issues.apache.org/jira/browse/GOBBLIN-798
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Hung Tran
>Assignee: Hung Tran
>Priority: Major
>
> If the application master aborts a new one may be spawned by YARN. The second 
> application master will resubmit the jobs. This results in duplicate jobs in 
> Helix and multiple instances of the job may run, resulting in duplicate data.
> The Gobblin application master should clean up all workflows on startup to 
> avoid executing multiple instances of a job.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GOBBLIN-798) Cleanup workflows from Helix when the Gobblin application master starts

2019-06-06 Thread Hung Tran (JIRA)
Hung Tran created GOBBLIN-798:
-

 Summary: Cleanup workflows from Helix when the Gobblin application 
master starts
 Key: GOBBLIN-798
 URL: https://issues.apache.org/jira/browse/GOBBLIN-798
 Project: Apache Gobblin
  Issue Type: Task
Reporter: Hung Tran
Assignee: Hung Tran


If the application master aborts a new one may be spawned by YARN. The second 
application master will resubmit the jobs. This results in duplicate jobs in 
Helix and multiple instances of the job may run, resulting in duplicate data.

The Gobblin application master should clean up all workflows on startup to 
avoid executing multiple instances of a job.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (GOBBLIN-791) Fix hanging stream on error in asynchronous execution model

2019-06-03 Thread Hung Tran (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-791.
---
   Resolution: Fixed
Fix Version/s: 0.15.0

Issue resolved by pull request #2659
[https://github.com/apache/incubator-gobblin/pull/2659]

> Fix hanging stream on error in asynchronous execution model
> ---
>
> Key: GOBBLIN-791
> URL: https://issues.apache.org/jira/browse/GOBBLIN-791
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Hung Tran
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The asynchronous task execution model uses ReactiveX streams with a 
> ConnectableFlowable. This is  a hot flowable, so it does not terminate when 
> all subscribers have exited. This results in the extractor continuing to emit 
> records after downstream constructs have exited due to an error. This is very 
> problematic for extractors that introduce waits on control message acks since 
> the extractor may hang.
> Another issue is the errors do not propagate upwards, so errors in the writer 
> do not fail the fork. Change the state of the fork onCancel() to a failure 
> state so that the task gets failed.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GOBBLIN-791) Fix hanging stream on error in asynchronous execution model

2019-05-31 Thread Hung Tran (JIRA)
Hung Tran created GOBBLIN-791:
-

 Summary: Fix hanging stream on error in asynchronous execution 
model
 Key: GOBBLIN-791
 URL: https://issues.apache.org/jira/browse/GOBBLIN-791
 Project: Apache Gobblin
  Issue Type: Task
Reporter: Hung Tran


The asynchronous task execution model uses ReactiveX streams with a 
ConnectableFlowable. This is  a hot flowable, so it does not terminate when all 
subscribers have exited. This results in the extractor continuing to emit 
records after downstream constructs have exited due to an error. This is very 
problematic for extractors that introduce waits on control message acks since 
the extractor may hang.

Another issue is the errors do not propagate upwards, so errors in the writer 
do not fail the fork. Change the state of the fork onCancel() to a failure 
state so that the task gets failed.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (GOBBLIN-787) Add an option to include the task start time in the output file name

2019-05-29 Thread Hung Tran (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-787.
---
   Resolution: Fixed
Fix Version/s: 0.15.0

Issue resolved by pull request #2653
[https://github.com/apache/incubator-gobblin/pull/2653]

> Add an option to include the task start time in the output file name
> 
>
> Key: GOBBLIN-787
> URL: https://issues.apache.org/jira/browse/GOBBLIN-787
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Hung Tran
>Assignee: Hung Tran
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> In some cases a task may be scheduled to run on multiple workers. One case 
> where this happens is when running with the Helix task execution framework. 
> Helix may reschedule a task on a different worker if it loses contact with a 
> worker. That worker may continue executing for some time before the task is 
> terminated. During this period if the output file names collide then there 
> may be an error during data publish.
> Add an option "writer.addTaskTimestamp" that can be used to reduce the chance 
> of name collisions by appending a task startup timestamp to the file name.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GOBBLIN-787) Add an option to include the task start time in the output file name

2019-05-28 Thread Hung Tran (JIRA)
Hung Tran created GOBBLIN-787:
-

 Summary: Add an option to include the task start time in the 
output file name
 Key: GOBBLIN-787
 URL: https://issues.apache.org/jira/browse/GOBBLIN-787
 Project: Apache Gobblin
  Issue Type: Task
Reporter: Hung Tran
Assignee: Hung Tran


In some cases a task may be scheduled to run on multiple workers. One case 
where this happens is when running with the Helix task execution framework. 
Helix may reschedule a task on a different worker if it loses contact with a 
worker. That worker may continue executing for some time before the task is 
terminated. During this period if the output file names collide then there may 
be an error during data publish.

Add an option "writer.addTaskTimestamp" that can be used to reduce the chance 
of name collisions by appending a task startup timestamp to the file name.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (GOBBLIN-783) Fix the double referencing issue for job type config

2019-05-28 Thread Hung Tran (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-783.
---
   Resolution: Fixed
Fix Version/s: 0.15.0

Issue resolved by pull request #2646
[https://github.com/apache/incubator-gobblin/pull/2646]

> Fix the double referencing issue for job type config
> 
>
> Key: GOBBLIN-783
> URL: https://issues.apache.org/jira/browse/GOBBLIN-783
> Project: Apache Gobblin
>  Issue Type: Bug
>Reporter: Kuai Yu
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (GOBBLIN-766) Emit Workunits created event in Apache gobblin

2019-05-28 Thread Hung Tran (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-766.
---
   Resolution: Fixed
Fix Version/s: 0.15.0

Issue resolved by pull request #2636
[https://github.com/apache/incubator-gobblin/pull/2636]

> Emit  Workunits created  event  in Apache gobblin
> -
>
> Key: GOBBLIN-766
> URL: https://issues.apache.org/jira/browse/GOBBLIN-766
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: kraman
>Priority: Minor
> Fix For: 0.15.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Emit a new workunits created metric to be captured for monitoring/Alerting



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (GOBBLIN-780) Handle scenarios that cause the YarnAutoScalingManager to be stuck

2019-05-28 Thread Hung Tran (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-780.
---
   Resolution: Fixed
Fix Version/s: 0.15.0

Issue resolved by pull request #2644
[https://github.com/apache/incubator-gobblin/pull/2644]

> Handle scenarios that cause the YarnAutoScalingManager to be stuck
> --
>
> Key: GOBBLIN-780
> URL: https://issues.apache.org/jira/browse/GOBBLIN-780
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Hung Tran
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Issue 1: The YarnAutoScalingRunnable is run in a fixed schedule by a 
> ScheduledExecutorService in YarnAutoScalingManager. If the runnable 
> encounters an exception the the executor service will stop scheduling it. 
> Catch all exceptions in the runnable, log, and do not re-raise.
> Issue 2: The auto scaler may reduce the container count to 0. Helix will not 
> schedule any flows if there are no participants connected. This results in 
> the auto scaler keeping the container count at 0 and no progress is made. Fix 
> this by not allowing the container count to be reduced below 1.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (GOBBLIN-780) Handle scenarios that cause the YarnAutoScalingManager to be stuck

2019-05-23 Thread Hung Tran (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran updated GOBBLIN-780:
--
Summary: Handle scenarios that cause the YarnAutoScalingManager to be stuck 
 (was: Handle scenarios that causes the YarnAutoScalingManager to be stuck)

> Handle scenarios that cause the YarnAutoScalingManager to be stuck
> --
>
> Key: GOBBLIN-780
> URL: https://issues.apache.org/jira/browse/GOBBLIN-780
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Hung Tran
>Priority: Major
>
> Issue 1: The YarnAutoScalingRunnable is run in a fixed schedule by a 
> ScheduledExecutorService in YarnAutoScalingManager. If the runnable 
> encounters an exception the the executor service will stop scheduling it. 
> Catch all exceptions in the runnable, log, and do not re-raise.
> Issue 2: The auto scaler may reduce the container count to 0. Helix will not 
> schedule any flows if there are no participants connected. This results in 
> the auto scaler keeping the container count at 0 and no progress is made. Fix 
> this by not allowing the container count to be reduced below 1.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GOBBLIN-780) Handle scenarios that causes the YarnAutoScalingManager to be stuck

2019-05-23 Thread Hung Tran (JIRA)
Hung Tran created GOBBLIN-780:
-

 Summary: Handle scenarios that causes the YarnAutoScalingManager 
to be stuck
 Key: GOBBLIN-780
 URL: https://issues.apache.org/jira/browse/GOBBLIN-780
 Project: Apache Gobblin
  Issue Type: Task
Reporter: Hung Tran


Issue 1: The YarnAutoScalingRunnable is run in a fixed schedule by a 
ScheduledExecutorService in YarnAutoScalingManager. If the runnable encounters 
an exception the the executor service will stop scheduling it. Catch all 
exceptions in the runnable, log, and do not re-raise.

Issue 2: The auto scaler may reduce the container count to 0. Helix will not 
schedule any flows if there are no participants connected. This results in the 
auto scaler keeping the container count at 0 and no progress is made. Fix this 
by not allowing the container count to be reduced below 1.

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (GOBBLIN-777) Remove container request after container allocation

2019-05-21 Thread Hung Tran (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-777.
---
   Resolution: Fixed
Fix Version/s: 0.15.0

Issue resolved by pull request #2641
[https://github.com/apache/incubator-gobblin/pull/2641]

> Remove container request after container allocation
> ---
>
> Key: GOBBLIN-777
> URL: https://issues.apache.org/jira/browse/GOBBLIN-777
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Hung Tran
>Assignee: Hung Tran
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Due to YARN-1902, a request for containers may allocate more containers than 
> desired since the requests are not automatically removed when a container is 
> allocated.
> The Gobblin YarnService needs to work around this issue by removing a 
> matching container request in the container allocation callback.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GOBBLIN-777) Remove container request after container allocation

2019-05-21 Thread Hung Tran (JIRA)
Hung Tran created GOBBLIN-777:
-

 Summary: Remove container request after container allocation
 Key: GOBBLIN-777
 URL: https://issues.apache.org/jira/browse/GOBBLIN-777
 Project: Apache Gobblin
  Issue Type: Task
Reporter: Hung Tran
Assignee: Hung Tran


Due to YARN-1902, a request for containers may allocate more containers than 
desired since the requests are not automatically removed when a container is 
allocated.

The Gobblin YarnService needs to work around this issue by removing a matching 
container request in the container allocation callback.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (GOBBLIN-774) Send nack when a control message handler fails in Fork

2019-05-20 Thread Hung Tran (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-774.
---
   Resolution: Fixed
Fix Version/s: 0.15.0

Issue resolved by pull request #2639
[https://github.com/apache/incubator-gobblin/pull/2639]

> Send nack when a control message handler fails in Fork
> --
>
> Key: GOBBLIN-774
> URL: https://issues.apache.org/jira/browse/GOBBLIN-774
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Hung Tran
>Assignee: Hung Tran
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Fork will raise an error without ack/nacking if the control message handler 
> raises an error. This can result in another thread waiting indefinitely for a 
> control message ack. Fork.
> consumeRecordStream() should handle control message exceptions by calling 
> nack() with the exception before reraising the error.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GOBBLIN-774) Send nack when a control message handler fails in Fork

2019-05-20 Thread Hung Tran (JIRA)
Hung Tran created GOBBLIN-774:
-

 Summary: Send nack when a control message handler fails in Fork
 Key: GOBBLIN-774
 URL: https://issues.apache.org/jira/browse/GOBBLIN-774
 Project: Apache Gobblin
  Issue Type: Task
Reporter: Hung Tran
Assignee: Hung Tran


Fork will raise an error without ack/nacking if the control message handler 
raises an error. This can result in another thread waiting indefinitely for a 
control message ack. Fork.

consumeRecordStream() should handle control message exceptions by calling 
nack() with the exception before reraising the error.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (GOBBLIN-770) Add JVM configuration to avoid exhausting YARN container memory

2019-05-15 Thread Hung Tran (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-770.
---
   Resolution: Fixed
Fix Version/s: 0.15.0

Issue resolved by pull request #2634
[https://github.com/apache/incubator-gobblin/pull/2634]

> Add JVM configuration to avoid exhausting YARN container memory 
> 
>
> Key: GOBBLIN-770
> URL: https://issues.apache.org/jira/browse/GOBBLIN-770
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Hung Tran
>Assignee: Hung Tran
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The current code sets Xmx to the value of the YARN container memory limit. 
> The JVM is highly likely to hit the container memory limit with this 
> configuration due to overhead costs that are not in the JVM heap.
> Configuration should be added to set JVM memory as a percentage of the 
> container memory minus a configurable overhead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (GOBBLIN-770) Add JVM configuration to avoid exhausting YARN container memory

2019-05-14 Thread Hung Tran (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran reassigned GOBBLIN-770:
-

Assignee: Hung Tran

> Add JVM configuration to avoid exhausting YARN container memory 
> 
>
> Key: GOBBLIN-770
> URL: https://issues.apache.org/jira/browse/GOBBLIN-770
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Hung Tran
>Assignee: Hung Tran
>Priority: Major
>
> The current code sets Xmx to the value of the YARN container memory limit. 
> The JVM is highly likely to hit the container memory limit with this 
> configuration due to overhead costs that are not in the JVM heap.
> Configuration should be added to set JVM memory as a percentage of the 
> container memory minus a configurable overhead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GOBBLIN-770) Add JVM configuration to avoid exhausting YARN container memory

2019-05-14 Thread Hung Tran (JIRA)
Hung Tran created GOBBLIN-770:
-

 Summary: Add JVM configuration to avoid exhausting YARN container 
memory 
 Key: GOBBLIN-770
 URL: https://issues.apache.org/jira/browse/GOBBLIN-770
 Project: Apache Gobblin
  Issue Type: Task
Reporter: Hung Tran


The current code sets Xmx to the value of the YARN container memory limit. The 
JVM is highly likely to hit the container memory limit with this configuration 
due to overhead costs that are not in the JVM heap.

Configuration should be added to set JVM memory as a percentage of the 
container memory minus a configurable overhead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (GOBBLIN-769) Support string record timestamp in TimeBasedAvroWriterPartitioner

2019-05-14 Thread Hung Tran (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-769.
---
   Resolution: Fixed
Fix Version/s: 0.15.0

Issue resolved by pull request #2632
[https://github.com/apache/incubator-gobblin/pull/2632]

> Support string record timestamp in TimeBasedAvroWriterPartitioner
> -
>
> Key: GOBBLIN-769
> URL: https://issues.apache.org/jira/browse/GOBBLIN-769
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Zhixiong Chen
>Assignee: Zhixiong Chen
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Currently, if a record timestamp is a string, 
> `TimeBasedAvroWriterPartitioner` will not be able to recognize it and will 
> use current time



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (GOBBLIN-762) Add automatic scaling for Gobblin on YARN

2019-05-14 Thread Hung Tran (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-762.
---
   Resolution: Fixed
Fix Version/s: 0.15.0

Issue resolved by pull request #2626
[https://github.com/apache/incubator-gobblin/pull/2626]

> Add automatic scaling for Gobblin on YARN
> -
>
> Key: GOBBLIN-762
> URL: https://issues.apache.org/jira/browse/GOBBLIN-762
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Hung Tran
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> Gobblin on YARN needs a way to scale up and down the containers based on the 
> workload.
> Added `YarnAutoScalingManager` which can be started by the 
> `GobblinApplicationMaster` by setting the 
> `gobblin.yarn.app.master.serviceClasses` configuration. This class runs a 
> scheduled task with a default interval of 60 seconds to detect the number of 
> required partitions for the workflows submitted to Helix. It will request the 
> `YarnService` to scale to a computed number of containers. If the requested 
> number of containers is higher than the YarnService has previously requested 
> then it will request more containers. If the requested count is less than the 
> current number of allocated containers then it will free any unused 
> containers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (GOBBLIN-767) Support different time units in TimeBasedWriterPartitioner

2019-05-10 Thread Hung Tran (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-767.
---
   Resolution: Fixed
Fix Version/s: 0.15.0

Issue resolved by pull request #2630
[https://github.com/apache/incubator-gobblin/pull/2630]

> Support different time units in TimeBasedWriterPartitioner
> --
>
> Key: GOBBLIN-767
> URL: https://issues.apache.org/jira/browse/GOBBLIN-767
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Zhixiong Chen
>Assignee: Zhixiong Chen
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Currently, `TimeBasedWriterPartitioner` assumes the timestamp value from a 
> record is in millis. The task is to remove the assumption and support 
> timestamp in different units, by default, in millis.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (GOBBLIN-763) Support fields removal for compaction dedup key schema

2019-05-08 Thread Hung Tran (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-763.
---
   Resolution: Fixed
Fix Version/s: 0.15.0

Issue resolved by pull request #2627
[https://github.com/apache/incubator-gobblin/pull/2627]

> Support fields removal for compaction dedup key schema
> --
>
> Key: GOBBLIN-763
> URL: https://issues.apache.org/jira/browse/GOBBLIN-763
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Zhixiong Chen
>Assignee: Zhixiong Chen
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> - Remove fields, specified by configuration 
> `compaction.job.key.fieldBlacklist`, while computing compaction dedup key 
> schema
> - Fix incorrect `AvroUtils.removeUncomparableFields` implementation, which 
> only keeps the first field of any schema, dropping all other fields which 
> have the same schema. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (GOBBLIN-764) Allow passing of rest.li parameters to throttling client

2019-05-06 Thread Hung Tran (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-764.
---
   Resolution: Fixed
Fix Version/s: 0.15.0

Issue resolved by pull request #2628
[https://github.com/apache/incubator-gobblin/pull/2628]

> Allow passing of rest.li parameters to throttling client
> 
>
> Key: GOBBLIN-764
> URL: https://issues.apache.org/jira/browse/GOBBLIN-764
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Issac Buenrostro
>Assignee: Issac Buenrostro
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (GOBBLIN-762) Add automatic scaling for Gobblin on YARN

2019-05-02 Thread Hung Tran (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran updated GOBBLIN-762:
--
Description: 
Gobblin on YARN needs a way to scale up and down the containers based on the 
workload.

Added `YarnAutoScalingManager` which can be started by the 
`GobblinApplicationMaster` by setting the 
`gobblin.yarn.app.master.serviceClasses` configuration. This class runs a 
scheduled task with a default interval of 60 seconds to detect the number of 
required partitions for the workflows submitted to Helix. It will request the 
`YarnService` to scale to a computed number of containers. If the requested 
number of containers is higher than the YarnService has previously requested 
then it will request more containers. If the requested count is less than the 
current number of allocated containers then it will free any unused containers.

  was:Gobblin on YARN needs a way to scale up and down the containers based on 
the workload.


> Add automatic scaling for Gobblin on YARN
> -
>
> Key: GOBBLIN-762
> URL: https://issues.apache.org/jira/browse/GOBBLIN-762
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Hung Tran
>Priority: Major
>
> Gobblin on YARN needs a way to scale up and down the containers based on the 
> workload.
> Added `YarnAutoScalingManager` which can be started by the 
> `GobblinApplicationMaster` by setting the 
> `gobblin.yarn.app.master.serviceClasses` configuration. This class runs a 
> scheduled task with a default interval of 60 seconds to detect the number of 
> required partitions for the workflows submitted to Helix. It will request the 
> `YarnService` to scale to a computed number of containers. If the requested 
> number of containers is higher than the YarnService has previously requested 
> then it will request more containers. If the requested count is less than the 
> current number of allocated containers then it will free any unused 
> containers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GOBBLIN-762) Add automatic scaling for Gobblin on YARN

2019-05-02 Thread Hung Tran (JIRA)
Hung Tran created GOBBLIN-762:
-

 Summary: Add automatic scaling for Gobblin on YARN
 Key: GOBBLIN-762
 URL: https://issues.apache.org/jira/browse/GOBBLIN-762
 Project: Apache Gobblin
  Issue Type: Task
Reporter: Hung Tran


Gobblin on YARN needs a way to scale up and down the containers based on the 
workload.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (GOBBLIN-761) Fix runtime property like Topic.name not available in Compaction when fetching configStore object

2019-05-01 Thread Hung Tran (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-761.
---
   Resolution: Fixed
Fix Version/s: 0.15.0

Issue resolved by pull request #2625
[https://github.com/apache/incubator-gobblin/pull/2625]

> Fix runtime property like Topic.name not available in Compaction when 
> fetching configStore object
> -
>
> Key: GOBBLIN-761
> URL: https://issues.apache.org/jira/browse/GOBBLIN-761
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Lei Sun
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (GOBBLIN-747) Set expected schema when creating workunits

2019-04-23 Thread Hung Tran (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-747.
---
   Resolution: Fixed
Fix Version/s: 0.15.0

Issue resolved by pull request #2612
[https://github.com/apache/incubator-gobblin/pull/2612]

> Set expected schema when creating workunits
> ---
>
> Key: GOBBLIN-747
> URL: https://issues.apache.org/jira/browse/GOBBLIN-747
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Zihan Li
>Priority: Major
> Fix For: 0.15.0
>
>
> Set the property of gobblin.copy.expectedSchema when creating the workunit to 
> enable schema check in distcp.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (GOBBLIN-738) Open a way to customize decoding KafkaConsumerRecord

2019-04-20 Thread Hung Tran (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-738.
---
   Resolution: Fixed
Fix Version/s: 0.15.0

Issue resolved by pull request #2605
[https://github.com/apache/incubator-gobblin/pull/2605]

> Open a way to customize decoding KafkaConsumerRecord
> 
>
> Key: GOBBLIN-738
> URL: https://issues.apache.org/jira/browse/GOBBLIN-738
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Zhixiong Chen
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently, decoding a `KafkaConsumerRecord` is limited to 2 forms:
>   - decode as a `ByteArrayBasedKafkaRecord` message
>   - convert value from a `DecodeableKafkaRecord` message
> The task is to open a way for arbitrary decoding mechanism



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GOBBLIN-743) Initialize Gobblin application master services with dynamic config

2019-04-20 Thread Hung Tran (JIRA)
Hung Tran created GOBBLIN-743:
-

 Summary: Initialize Gobblin application master services with 
dynamic config
 Key: GOBBLIN-743
 URL: https://issues.apache.org/jira/browse/GOBBLIN-743
 Project: Apache Gobblin
  Issue Type: Task
Reporter: Hung Tran
Assignee: Hung Tran


The Gobblin application manager needs to initialize services with the config 
generated by the dynamic config generator. One use case that requires this is 
the passing of SSL configuration to kafka consumers and producers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (GOBBLIN-726) Enable Schema Verification During Primary Dataset Deployment

2019-04-19 Thread Hung Tran (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-726.
---
   Resolution: Fixed
Fix Version/s: 0.15.0

Issue resolved by pull request #2593
[https://github.com/apache/incubator-gobblin/pull/2593]

> Enable Schema Verification During Primary Dataset Deployment
> 
>
> Key: GOBBLIN-726
> URL: https://issues.apache.org/jira/browse/GOBBLIN-726
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Zihan Li
>Priority: Major
> Fix For: 0.15.0
>
>
> Each distcp mapper will first read the schema of the file to be copied, and 
> abort if the file schema does not match the expected schema. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (GOBBLIN-739) Add a way to propagate the Azkaban job config to Gobblin on YARN

2019-04-16 Thread Hung Tran (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-739.
---
   Resolution: Fixed
Fix Version/s: 0.15.0

Issue resolved by pull request #2606
[https://github.com/apache/incubator-gobblin/pull/2606]

> Add a way to propagate the Azkaban job config to Gobblin on YARN
> 
>
> Key: GOBBLIN-739
> URL: https://issues.apache.org/jira/browse/GOBBLIN-739
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Hung Tran
>Assignee: Hung Tran
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The AzkabanGobblinYarnAppLauncher can be used to launch a Gobblin application 
> master on YARN, which then loads configuration from an application.conf file. 
> Currently, the application.conf is pre-generated and packaged with the 
> Azkaban job zip. This results in duplication of config between the Azkaban 
> job properties and the application.conf file. It also doesn't allow user 
> overrides in the Azkaban UI to be propagated to the app master and containers.
> A config should be added to specify an output path to write the Azkaban job 
> config to in HOCON format. The gobblin yarn config such as 
> gobblin.yarn.app.master.files.local and gobblin.yarn.container.files.local 
> can be set to point to the output file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


  1   2   3   4   5   >