[jira] [Work logged] (GOBBLIN-865) Add feature that enables PK-chunking in partition
[ https://issues.apache.org/jira/browse/GOBBLIN-865?focusedWorklogId=317008=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317008 ] ASF GitHub Bot logged work on GOBBLIN-865: -- Author: ASF GitHub Bot Created on: 23/Sep/19 22:19 Start Date: 23/Sep/19 22:19 Worklog Time Spent: 10m Work Description: codecov-io commented on issue #2722: GOBBLIN-865: Add feature that enables PK-chunking in partition URL: https://github.com/apache/incubator-gobblin/pull/2722#issuecomment-531069100 # [Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2722?src=pr=h1) Report > Merging [#2722](https://codecov.io/gh/apache/incubator-gobblin/pull/2722?src=pr=desc) into [master](https://codecov.io/gh/apache/incubator-gobblin/commit/9bf9a882427e98e7f4ef089c4ca1bde42f4b36a3?src=pr=desc) will **increase** coverage by `<.01%`. > The diff coverage is `1.64%`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/graphs/tree.svg?width=650=4MgURJ0bGc=150=pr)](https://codecov.io/gh/apache/incubator-gobblin/pull/2722?src=pr=tree) ```diff @@ Coverage Diff @@ ## master#2722 +/- ## + Coverage 45.04% 45.05% +<.01% - Complexity 8739 8780 +41 Files 1880 1886 +6 Lines 7020570642 +437 Branches 7707 7745 +38 + Hits 3162331826 +203 - Misses3565135870 +219 - Partials 2931 2946 +15 ``` | [Impacted Files](https://codecov.io/gh/apache/incubator-gobblin/pull/2722?src=pr=tree) | Coverage Δ | Complexity Δ | | |---|---|---|---| | [...obblin/salesforce/SalesforceConfigurationKeys.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/diff?src=pr=tree#diff-Z29iYmxpbi1zYWxlc2ZvcmNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NhbGVzZm9yY2UvU2FsZXNmb3JjZUNvbmZpZ3VyYXRpb25LZXlzLmphdmE=) | `0% <ø> (ø)` | `0 <0> (ø)` | :arrow_down: | | [...apache/gobblin/salesforce/SalesforceExtractor.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/diff?src=pr=tree#diff-Z29iYmxpbi1zYWxlc2ZvcmNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NhbGVzZm9yY2UvU2FsZXNmb3JjZUV4dHJhY3Rvci5qYXZh) | `0% <0%> (ø)` | `0 <0> (ø)` | :arrow_down: | | [...ce/extractor/extract/restapi/RestApiExtractor.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/diff?src=pr=tree#diff-Z29iYmxpbi1jb3JlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NvdXJjZS9leHRyYWN0b3IvZXh0cmFjdC9yZXN0YXBpL1Jlc3RBcGlFeHRyYWN0b3IuamF2YQ==) | `0% <0%> (ø)` | `0 <0> (ø)` | :arrow_down: | | [...rg/apache/gobblin/salesforce/SalesforceSource.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/diff?src=pr=tree#diff-Z29iYmxpbi1zYWxlc2ZvcmNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NhbGVzZm9yY2UvU2FsZXNmb3JjZVNvdXJjZS5qYXZh) | `19.74% <5.66%> (-3.02%)` | `12 <1> (+1)` | | | [...obblin/service/monitoring/FlowStatusGenerator.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/diff?src=pr=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NlcnZpY2UvbW9uaXRvcmluZy9GbG93U3RhdHVzR2VuZXJhdG9yLmphdmE=) | `82.14% <0%> (-7.15%)` | `11% <0%> (-1%)` | | | [...bblin/compaction/mapreduce/orc/OrcValueMapper.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/diff?src=pr=tree#diff-Z29iYmxpbi1jb21wYWN0aW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NvbXBhY3Rpb24vbWFwcmVkdWNlL29yYy9PcmNWYWx1ZU1hcHBlci5qYXZh) | `78.87% <0%> (-2.38%)` | `16% <0%> (+11%)` | | | [...apache/gobblin/runtime/local/LocalJobLauncher.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/diff?src=pr=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvbG9jYWwvTG9jYWxKb2JMYXVuY2hlci5qYXZh) | `61.81% <0%> (-2.34%)` | `5% <0%> (ø)` | | | [...ache/gobblin/couchbase/writer/CouchbaseWriter.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/diff?src=pr=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4tY291Y2hiYXNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NvdWNoYmFzZS93cml0ZXIvQ291Y2hiYXNlV3JpdGVyLmphdmE=) | `64.39% <0%> (-1.89%)` | `15% <0%> (+4%)` | | | [.../org/apache/gobblin/cluster/GobblinTaskRunner.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/diff?src=pr=tree#diff-Z29iYmxpbi1jbHVzdGVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NsdXN0ZXIvR29iYmxpblRhc2tSdW5uZXIuamF2YQ==) | `63.88% <0%> (-0.9%)` | `28% <0%> (-1%)` | | |
[GitHub] [incubator-gobblin] codecov-io edited a comment on issue #2722: GOBBLIN-865: Add feature that enables PK-chunking in partition
codecov-io edited a comment on issue #2722: GOBBLIN-865: Add feature that enables PK-chunking in partition URL: https://github.com/apache/incubator-gobblin/pull/2722#issuecomment-531069100 # [Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2722?src=pr=h1) Report > Merging [#2722](https://codecov.io/gh/apache/incubator-gobblin/pull/2722?src=pr=desc) into [master](https://codecov.io/gh/apache/incubator-gobblin/commit/9bf9a882427e98e7f4ef089c4ca1bde42f4b36a3?src=pr=desc) will **increase** coverage by `<.01%`. > The diff coverage is `1.64%`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/graphs/tree.svg?width=650=4MgURJ0bGc=150=pr)](https://codecov.io/gh/apache/incubator-gobblin/pull/2722?src=pr=tree) ```diff @@ Coverage Diff @@ ## master#2722 +/- ## + Coverage 45.04% 45.05% +<.01% - Complexity 8739 8780 +41 Files 1880 1886 +6 Lines 7020570642 +437 Branches 7707 7745 +38 + Hits 3162331826 +203 - Misses3565135870 +219 - Partials 2931 2946 +15 ``` | [Impacted Files](https://codecov.io/gh/apache/incubator-gobblin/pull/2722?src=pr=tree) | Coverage Δ | Complexity Δ | | |---|---|---|---| | [...obblin/salesforce/SalesforceConfigurationKeys.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/diff?src=pr=tree#diff-Z29iYmxpbi1zYWxlc2ZvcmNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NhbGVzZm9yY2UvU2FsZXNmb3JjZUNvbmZpZ3VyYXRpb25LZXlzLmphdmE=) | `0% <ø> (ø)` | `0 <0> (ø)` | :arrow_down: | | [...apache/gobblin/salesforce/SalesforceExtractor.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/diff?src=pr=tree#diff-Z29iYmxpbi1zYWxlc2ZvcmNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NhbGVzZm9yY2UvU2FsZXNmb3JjZUV4dHJhY3Rvci5qYXZh) | `0% <0%> (ø)` | `0 <0> (ø)` | :arrow_down: | | [...ce/extractor/extract/restapi/RestApiExtractor.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/diff?src=pr=tree#diff-Z29iYmxpbi1jb3JlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NvdXJjZS9leHRyYWN0b3IvZXh0cmFjdC9yZXN0YXBpL1Jlc3RBcGlFeHRyYWN0b3IuamF2YQ==) | `0% <0%> (ø)` | `0 <0> (ø)` | :arrow_down: | | [...rg/apache/gobblin/salesforce/SalesforceSource.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/diff?src=pr=tree#diff-Z29iYmxpbi1zYWxlc2ZvcmNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NhbGVzZm9yY2UvU2FsZXNmb3JjZVNvdXJjZS5qYXZh) | `19.74% <5.66%> (-3.02%)` | `12 <1> (+1)` | | | [...obblin/service/monitoring/FlowStatusGenerator.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/diff?src=pr=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NlcnZpY2UvbW9uaXRvcmluZy9GbG93U3RhdHVzR2VuZXJhdG9yLmphdmE=) | `82.14% <0%> (-7.15%)` | `11% <0%> (-1%)` | | | [...bblin/compaction/mapreduce/orc/OrcValueMapper.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/diff?src=pr=tree#diff-Z29iYmxpbi1jb21wYWN0aW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NvbXBhY3Rpb24vbWFwcmVkdWNlL29yYy9PcmNWYWx1ZU1hcHBlci5qYXZh) | `78.87% <0%> (-2.38%)` | `16% <0%> (+11%)` | | | [...apache/gobblin/runtime/local/LocalJobLauncher.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/diff?src=pr=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvbG9jYWwvTG9jYWxKb2JMYXVuY2hlci5qYXZh) | `61.81% <0%> (-2.34%)` | `5% <0%> (ø)` | | | [...ache/gobblin/couchbase/writer/CouchbaseWriter.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/diff?src=pr=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4tY291Y2hiYXNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NvdWNoYmFzZS93cml0ZXIvQ291Y2hiYXNlV3JpdGVyLmphdmE=) | `64.39% <0%> (-1.89%)` | `15% <0%> (+4%)` | | | [.../org/apache/gobblin/cluster/GobblinTaskRunner.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/diff?src=pr=tree#diff-Z29iYmxpbi1jbHVzdGVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NsdXN0ZXIvR29iYmxpblRhc2tSdW5uZXIuamF2YQ==) | `63.88% <0%> (-0.9%)` | `28% <0%> (-1%)` | | | [.../apache/gobblin/cluster/GobblinClusterManager.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/diff?src=pr=tree#diff-Z29iYmxpbi1jbHVzdGVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NsdXN0ZXIvR29iYmxpbkNsdXN0ZXJNYW5hZ2VyLmphdmE=) | `53.91% <0%> (-0.51%)` | `27% <0%> (ø)` | | | ... and [38 more](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/diff?src=pr=tree-more) | | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2722?src=pr=continue). > **Legend** - [Click here to
[GitHub] [incubator-gobblin] arekusuri commented on a change in pull request #2722: GOBBLIN-865: Add feature that enables PK-chunking in partition
arekusuri commented on a change in pull request #2722: GOBBLIN-865: Add feature that enables PK-chunking in partition URL: https://github.com/apache/incubator-gobblin/pull/2722#discussion_r327343275 ## File path: gobblin-salesforce/src/main/java/org/apache/gobblin/salesforce/SalesforceExtractor.java ## @@ -869,7 +862,7 @@ public SalesforceBulkJobId getQueryResultIdsPkChunking(String entity, List
[jira] [Work logged] (GOBBLIN-865) Add feature that enables PK-chunking in partition
[ https://issues.apache.org/jira/browse/GOBBLIN-865?focusedWorklogId=316988=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-316988 ] ASF GitHub Bot logged work on GOBBLIN-865: -- Author: ASF GitHub Bot Created on: 23/Sep/19 21:41 Start Date: 23/Sep/19 21:41 Worklog Time Spent: 10m Work Description: arekusuri commented on issue #2722: GOBBLIN-865: Add feature that enables PK-chunking in partition URL: https://github.com/apache/incubator-gobblin/pull/2722#issuecomment-534296716 Hi @htran1 and @zxcware Please hold on this pull request a little bit. We will try another approach to ensure the code is enough mature before we merge into opensource. the doc is here - https://docs.google.com/document/d/1fJ7Gju9tXR8WBbwxct0_l21Ijhb4hZJykYFDhDcRLp4/edit#heading=h.37qi9whhekol This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 316988) Time Spent: 8h 50m (was: 8h 40m) > Add feature that enables PK-chunking in partition > -- > > Key: GOBBLIN-865 > URL: https://issues.apache.org/jira/browse/GOBBLIN-865 > Project: Apache Gobblin > Issue Type: Task >Reporter: Alex Li >Priority: Major > Labels: salesforce > Time Spent: 8h 50m > Remaining Estimate: 0h > > In SFDC(salesforce) connector, we have partitioning mechanisms to split a > giant query to multiple sub queries. There are 3 mechanisms: > * simple partition (equally split by time) > * dynamic pre-partition (generate histogram and split by row numbers) > * user specified partition (set up time range in job file) > However there are tables like Task and Contract are failing time to time to > fetch full data. > We may want to utilize PK-chunking to partition the query. > > The pk-chunking doc from SFDC - > [https://developer.salesforce.com/docs/atlas.en-us.api_asynch.meta/api_asynch/async_api_headers_enable_pk_chunking.htm] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [incubator-gobblin] arekusuri commented on issue #2722: GOBBLIN-865: Add feature that enables PK-chunking in partition
arekusuri commented on issue #2722: GOBBLIN-865: Add feature that enables PK-chunking in partition URL: https://github.com/apache/incubator-gobblin/pull/2722#issuecomment-534296716 Hi @htran1 and @zxcware Please hold on this pull request a little bit. We will try another approach to ensure the code is enough mature before we merge into opensource. the doc is here - https://docs.google.com/document/d/1fJ7Gju9tXR8WBbwxct0_l21Ijhb4hZJykYFDhDcRLp4/edit#heading=h.37qi9whhekol This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Work logged] (GOBBLIN-881) Add job tag field that can be used to filter job statuses
[ https://issues.apache.org/jira/browse/GOBBLIN-881?focusedWorklogId=316973=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-316973 ] ASF GitHub Bot logged work on GOBBLIN-881: -- Author: ASF GitHub Bot Created on: 23/Sep/19 20:56 Start Date: 23/Sep/19 20:56 Worklog Time Spent: 10m Work Description: arjun4084346 commented on issue #2735: [GOBBLIN-881] Add job tag field that can be used to filter job statuses URL: https://github.com/apache/incubator-gobblin/pull/2735#issuecomment-534281157 +1 LGTM. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 316973) Time Spent: 40m (was: 0.5h) > Add job tag field that can be used to filter job statuses > - > > Key: GOBBLIN-881 > URL: https://issues.apache.org/jira/browse/GOBBLIN-881 > Project: Apache Gobblin > Issue Type: Bug >Reporter: Jack Moseley >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [incubator-gobblin] arjun4084346 commented on issue #2735: [GOBBLIN-881] Add job tag field that can be used to filter job statuses
arjun4084346 commented on issue #2735: [GOBBLIN-881] Add job tag field that can be used to filter job statuses URL: https://github.com/apache/incubator-gobblin/pull/2735#issuecomment-534281157 +1 LGTM. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Work logged] (GOBBLIN-865) Add feature that enables PK-chunking in partition
[ https://issues.apache.org/jira/browse/GOBBLIN-865?focusedWorklogId=316920=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-316920 ] ASF GitHub Bot logged work on GOBBLIN-865: -- Author: ASF GitHub Bot Created on: 23/Sep/19 18:54 Start Date: 23/Sep/19 18:54 Worklog Time Spent: 10m Work Description: arekusuri commented on pull request #2722: GOBBLIN-865: Add feature that enables PK-chunking in partition URL: https://github.com/apache/incubator-gobblin/pull/2722#discussion_r327275396 ## File path: gobblin-salesforce/src/main/java/org/apache/gobblin/salesforce/SalesforceExtractor.java ## @@ -598,6 +595,8 @@ public String getTimestampPredicateCondition(String column, long value, String v String[] batchIdResultIdArray = partitionPkChunkingBatchIdResultIterator.next().split(":"); String batchId = batchIdResultIdArray[0]; String resultId = batchIdResultIdArray[1]; +log.info(String.format("PK-Chunking work unit: fetching file for (jobId=%s, batchId=%s, resultId=%s) ", Review comment: thanks for offline talk, fixed. :) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 316920) Time Spent: 8h 40m (was: 8.5h) > Add feature that enables PK-chunking in partition > -- > > Key: GOBBLIN-865 > URL: https://issues.apache.org/jira/browse/GOBBLIN-865 > Project: Apache Gobblin > Issue Type: Task >Reporter: Alex Li >Priority: Major > Labels: salesforce > Time Spent: 8h 40m > Remaining Estimate: 0h > > In SFDC(salesforce) connector, we have partitioning mechanisms to split a > giant query to multiple sub queries. There are 3 mechanisms: > * simple partition (equally split by time) > * dynamic pre-partition (generate histogram and split by row numbers) > * user specified partition (set up time range in job file) > However there are tables like Task and Contract are failing time to time to > fetch full data. > We may want to utilize PK-chunking to partition the query. > > The pk-chunking doc from SFDC - > [https://developer.salesforce.com/docs/atlas.en-us.api_asynch.meta/api_asynch/async_api_headers_enable_pk_chunking.htm] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [incubator-gobblin] arekusuri commented on a change in pull request #2722: GOBBLIN-865: Add feature that enables PK-chunking in partition
arekusuri commented on a change in pull request #2722: GOBBLIN-865: Add feature that enables PK-chunking in partition URL: https://github.com/apache/incubator-gobblin/pull/2722#discussion_r327275396 ## File path: gobblin-salesforce/src/main/java/org/apache/gobblin/salesforce/SalesforceExtractor.java ## @@ -598,6 +595,8 @@ public String getTimestampPredicateCondition(String column, long value, String v String[] batchIdResultIdArray = partitionPkChunkingBatchIdResultIterator.next().split(":"); String batchId = batchIdResultIdArray[0]; String resultId = batchIdResultIdArray[1]; +log.info(String.format("PK-Chunking work unit: fetching file for (jobId=%s, batchId=%s, resultId=%s) ", Review comment: thanks for offline talk, fixed. :) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Work logged] (GOBBLIN-853) Support multiple paths specified in flow config
[ https://issues.apache.org/jira/browse/GOBBLIN-853?focusedWorklogId=316885=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-316885 ] ASF GitHub Bot logged work on GOBBLIN-853: -- Author: ASF GitHub Bot Created on: 23/Sep/19 18:05 Start Date: 23/Sep/19 18:05 Worklog Time Spent: 10m Work Description: sv2000 commented on pull request #2709: [GOBBLIN-853] Support multiple paths specified in flow config URL: https://github.com/apache/incubator-gobblin/pull/2709#discussion_r327250309 ## File path: gobblin-service/src/main/java/org/apache/gobblin/service/modules/flow/MultiHopFlowCompiler.java ## @@ -197,6 +210,87 @@ public void awaitHealthy() throws InterruptedException { return jobExecutionPlanDag; } + /** + * If {@link FlowSpec} has {@link #DATASET_SUBPATHS_KEY}, split it into multiple flowSpecs using a provided base input + * and base output path to generate multiple source/destination paths. + */ + private static List splitFlowSpec(FlowSpec flowSpec) { +long flowExecutionId = FlowUtils.getOrCreateFlowExecutionId(flowSpec); +List flowSpecs = new ArrayList<>(); + +if (flowSpec.getConfig().hasPath(DATASET_SUBPATHS_KEY)) { + List datasetSubpaths = ConfigUtils.getStringList(flowSpec.getConfig(), DATASET_SUBPATHS_KEY); + String baseInputPath = ConfigUtils.getString(flowSpec.getConfig(), DATASET_BASE_INPUT_PATH_KEY, "/"); + String baseOutputPath = ConfigUtils.getString(flowSpec.getConfig(), DATASET_BASE_OUTPUT_PATH_KEY, "/"); + + for (String subPath : datasetSubpaths) { +Config newConfig = flowSpec.getConfig().withoutPath("dataset.subPaths") +.withValue(ConfigurationKeys.FLOW_EXECUTION_ID_KEY, ConfigValueFactory.fromAnyRef(flowExecutionId)) + .withValue(DatasetDescriptorConfigKeys.FLOW_INPUT_DATASET_DESCRIPTOR_PREFIX + "." + DatasetDescriptorConfigKeys.PATH_KEY, +ConfigValueFactory.fromAnyRef(new Path(baseInputPath, subPath).toString())) + .withValue(DatasetDescriptorConfigKeys.FLOW_OUTPUT_DATASET_DESCRIPTOR_PREFIX + "." + DatasetDescriptorConfigKeys.PATH_KEY, +ConfigValueFactory.fromAnyRef(new Path(baseOutputPath, subPath).toString())); +flowSpecs.add(copyFlowSpecWithNewConfig(flowSpec, newConfig)); + } +} else { + return splitFlowSpecByNumber(flowSpec); +} + +return flowSpecs; + } + + /** + * If {@link FlowSpec} has config keys like configKey.0, configKey.1, split it into multiple flowSpecs on these properties. + * Properties that do not specify numbers will be present in all returned flowSpecs. + */ + private static List splitFlowSpecByNumber(FlowSpec flowSpec) { Review comment: We probably should revisit this later. IIUC, it looks like there are 2 possible ways to specify multi-dataset flows, and the base-path/sub-path is a special case of splitting flowspecs by number. Better to focus on the common use case first i.e. the one where we have a common base path with different sub-paths. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 316885) Time Spent: 1h (was: 50m) > Support multiple paths specified in flow config > --- > > Key: GOBBLIN-853 > URL: https://issues.apache.org/jira/browse/GOBBLIN-853 > Project: Apache Gobblin > Issue Type: Bug >Reporter: Jack Moseley >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (GOBBLIN-853) Support multiple paths specified in flow config
[ https://issues.apache.org/jira/browse/GOBBLIN-853?focusedWorklogId=316888=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-316888 ] ASF GitHub Bot logged work on GOBBLIN-853: -- Author: ASF GitHub Bot Created on: 23/Sep/19 18:05 Start Date: 23/Sep/19 18:05 Worklog Time Spent: 10m Work Description: sv2000 commented on pull request #2709: [GOBBLIN-853] Support multiple paths specified in flow config URL: https://github.com/apache/incubator-gobblin/pull/2709#discussion_r327253366 ## File path: gobblin-service/src/main/java/org/apache/gobblin/service/modules/flow/MultiHopFlowCompiler.java ## @@ -66,6 +70,10 @@ @Alpha @Slf4j public class MultiHopFlowCompiler extends BaseFlowToJobSpecCompiler { + private static final String DATASET_SUBPATHS_KEY = "dataset.subPaths"; + private static final String DATASET_BASE_INPUT_PATH_KEY = "dataset.baseInputPath"; + private static final String DATASET_BASE_OUTPUT_PATH_KEY = "dataset.baseOutputPath"; Review comment: Same comment as above. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 316888) Time Spent: 1h (was: 50m) > Support multiple paths specified in flow config > --- > > Key: GOBBLIN-853 > URL: https://issues.apache.org/jira/browse/GOBBLIN-853 > Project: Apache Gobblin > Issue Type: Bug >Reporter: Jack Moseley >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (GOBBLIN-853) Support multiple paths specified in flow config
[ https://issues.apache.org/jira/browse/GOBBLIN-853?focusedWorklogId=316886=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-316886 ] ASF GitHub Bot logged work on GOBBLIN-853: -- Author: ASF GitHub Bot Created on: 23/Sep/19 18:05 Start Date: 23/Sep/19 18:05 Worklog Time Spent: 10m Work Description: sv2000 commented on pull request #2709: [GOBBLIN-853] Support multiple paths specified in flow config URL: https://github.com/apache/incubator-gobblin/pull/2709#discussion_r327253294 ## File path: gobblin-service/src/main/java/org/apache/gobblin/service/modules/flow/MultiHopFlowCompiler.java ## @@ -66,6 +70,10 @@ @Alpha @Slf4j public class MultiHopFlowCompiler extends BaseFlowToJobSpecCompiler { + private static final String DATASET_SUBPATHS_KEY = "dataset.subPaths"; + private static final String DATASET_BASE_INPUT_PATH_KEY = "dataset.baseInputPath"; Review comment: Same comment as above. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 316886) Time Spent: 1h (was: 50m) > Support multiple paths specified in flow config > --- > > Key: GOBBLIN-853 > URL: https://issues.apache.org/jira/browse/GOBBLIN-853 > Project: Apache Gobblin > Issue Type: Bug >Reporter: Jack Moseley >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (GOBBLIN-853) Support multiple paths specified in flow config
[ https://issues.apache.org/jira/browse/GOBBLIN-853?focusedWorklogId=316884=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-316884 ] ASF GitHub Bot logged work on GOBBLIN-853: -- Author: ASF GitHub Bot Created on: 23/Sep/19 18:05 Start Date: 23/Sep/19 18:05 Worklog Time Spent: 10m Work Description: sv2000 commented on pull request #2709: [GOBBLIN-853] Support multiple paths specified in flow config URL: https://github.com/apache/incubator-gobblin/pull/2709#discussion_r327245999 ## File path: gobblin-service/src/main/java/org/apache/gobblin/service/modules/spec/JobExecutionPlan.java ## @@ -90,8 +91,10 @@ private static JobSpec buildJobSpec(FlowSpec flowSpec, Config jobConfig, Long fl String jobName = ConfigUtils.getString(jobConfig, ConfigurationKeys.JOB_NAME_KEY, ""); String edgeId = ConfigUtils.getString(jobConfig, FlowGraphConfigurationKeys.FLOW_EDGE_ID_KEY, ""); - //Modify the job name to include the flow group, flow name and edge id. - jobName = Joiner.on(JOB_NAME_COMPONENT_SEPARATION_CHAR).join(flowGroup, flowName, jobName, edgeId); + // Modify the job name to include the flow group, flow name, edge id, and a random string to avoid collisions since + // job names are assumed to be unique within a dag. + jobName = Joiner.on(JOB_NAME_COMPONENT_SEPARATION_CHAR).join(flowGroup, flowName, jobName, edgeId, Review comment: This may be problematic for jobs with state store enabled, as the job name is used for store name. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 316884) Time Spent: 50m (was: 40m) > Support multiple paths specified in flow config > --- > > Key: GOBBLIN-853 > URL: https://issues.apache.org/jira/browse/GOBBLIN-853 > Project: Apache Gobblin > Issue Type: Bug >Reporter: Jack Moseley >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [incubator-gobblin] sv2000 commented on a change in pull request #2709: [GOBBLIN-853] Support multiple paths specified in flow config
sv2000 commented on a change in pull request #2709: [GOBBLIN-853] Support multiple paths specified in flow config URL: https://github.com/apache/incubator-gobblin/pull/2709#discussion_r327253294 ## File path: gobblin-service/src/main/java/org/apache/gobblin/service/modules/flow/MultiHopFlowCompiler.java ## @@ -66,6 +70,10 @@ @Alpha @Slf4j public class MultiHopFlowCompiler extends BaseFlowToJobSpecCompiler { + private static final String DATASET_SUBPATHS_KEY = "dataset.subPaths"; + private static final String DATASET_BASE_INPUT_PATH_KEY = "dataset.baseInputPath"; Review comment: Same comment as above. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-gobblin] sv2000 commented on a change in pull request #2709: [GOBBLIN-853] Support multiple paths specified in flow config
sv2000 commented on a change in pull request #2709: [GOBBLIN-853] Support multiple paths specified in flow config URL: https://github.com/apache/incubator-gobblin/pull/2709#discussion_r327253186 ## File path: gobblin-service/src/main/java/org/apache/gobblin/service/modules/flow/MultiHopFlowCompiler.java ## @@ -66,6 +70,10 @@ @Alpha @Slf4j public class MultiHopFlowCompiler extends BaseFlowToJobSpecCompiler { + private static final String DATASET_SUBPATHS_KEY = "dataset.subPaths"; Review comment: Should this property be moved to ConfigurationKeys and renamed as gobblin.flow.dataset.subPaths? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-gobblin] sv2000 commented on a change in pull request #2709: [GOBBLIN-853] Support multiple paths specified in flow config
sv2000 commented on a change in pull request #2709: [GOBBLIN-853] Support multiple paths specified in flow config URL: https://github.com/apache/incubator-gobblin/pull/2709#discussion_r327245999 ## File path: gobblin-service/src/main/java/org/apache/gobblin/service/modules/spec/JobExecutionPlan.java ## @@ -90,8 +91,10 @@ private static JobSpec buildJobSpec(FlowSpec flowSpec, Config jobConfig, Long fl String jobName = ConfigUtils.getString(jobConfig, ConfigurationKeys.JOB_NAME_KEY, ""); String edgeId = ConfigUtils.getString(jobConfig, FlowGraphConfigurationKeys.FLOW_EDGE_ID_KEY, ""); - //Modify the job name to include the flow group, flow name and edge id. - jobName = Joiner.on(JOB_NAME_COMPONENT_SEPARATION_CHAR).join(flowGroup, flowName, jobName, edgeId); + // Modify the job name to include the flow group, flow name, edge id, and a random string to avoid collisions since + // job names are assumed to be unique within a dag. + jobName = Joiner.on(JOB_NAME_COMPONENT_SEPARATION_CHAR).join(flowGroup, flowName, jobName, edgeId, Review comment: This may be problematic for jobs with state store enabled, as the job name is used for store name. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-gobblin] sv2000 commented on a change in pull request #2709: [GOBBLIN-853] Support multiple paths specified in flow config
sv2000 commented on a change in pull request #2709: [GOBBLIN-853] Support multiple paths specified in flow config URL: https://github.com/apache/incubator-gobblin/pull/2709#discussion_r327253366 ## File path: gobblin-service/src/main/java/org/apache/gobblin/service/modules/flow/MultiHopFlowCompiler.java ## @@ -66,6 +70,10 @@ @Alpha @Slf4j public class MultiHopFlowCompiler extends BaseFlowToJobSpecCompiler { + private static final String DATASET_SUBPATHS_KEY = "dataset.subPaths"; + private static final String DATASET_BASE_INPUT_PATH_KEY = "dataset.baseInputPath"; + private static final String DATASET_BASE_OUTPUT_PATH_KEY = "dataset.baseOutputPath"; Review comment: Same comment as above. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Work logged] (GOBBLIN-865) Add feature that enables PK-chunking in partition
[ https://issues.apache.org/jira/browse/GOBBLIN-865?focusedWorklogId=316878=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-316878 ] ASF GitHub Bot logged work on GOBBLIN-865: -- Author: ASF GitHub Bot Created on: 23/Sep/19 17:59 Start Date: 23/Sep/19 17:59 Worklog Time Spent: 10m Work Description: arekusuri commented on pull request #2722: GOBBLIN-865: Add feature that enables PK-chunking in partition URL: https://github.com/apache/incubator-gobblin/pull/2722#discussion_r327251198 ## File path: gobblin-salesforce/src/main/java/org/apache/gobblin/salesforce/SalesforceExtractor.java ## @@ -1130,35 +1120,28 @@ public void closeConnection() throws Exception { /** * Waits for the PK batches to complete. The wait will stop after all batches are complete or on the first failed batch - * @param batchInfoList list of batch info - * @param waitInterval the polling interval - * @return the last {@link BatchInfo} processed - * @throws InterruptedException - * @throws AsyncApiException */ - private BatchInfo waitForPkBatches(BatchInfoList batchInfoList, int waitInterval) - throws InterruptedException, AsyncApiException { -BatchInfo batchInfo = null; + private void waitForPkBatches(String jobId, BatchInfoList batchInfoList, int waitInterval) { +long toWait = (long)waitInterval * 1000; BatchInfo[] batchInfos = batchInfoList.getBatchInfo(); - +log.info(String.format("Waiting for bulk (jobId=%s)", jobId)); Review comment: fixed. thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 316878) Time Spent: 8.5h (was: 8h 20m) > Add feature that enables PK-chunking in partition > -- > > Key: GOBBLIN-865 > URL: https://issues.apache.org/jira/browse/GOBBLIN-865 > Project: Apache Gobblin > Issue Type: Task >Reporter: Alex Li >Priority: Major > Labels: salesforce > Time Spent: 8.5h > Remaining Estimate: 0h > > In SFDC(salesforce) connector, we have partitioning mechanisms to split a > giant query to multiple sub queries. There are 3 mechanisms: > * simple partition (equally split by time) > * dynamic pre-partition (generate histogram and split by row numbers) > * user specified partition (set up time range in job file) > However there are tables like Task and Contract are failing time to time to > fetch full data. > We may want to utilize PK-chunking to partition the query. > > The pk-chunking doc from SFDC - > [https://developer.salesforce.com/docs/atlas.en-us.api_asynch.meta/api_asynch/async_api_headers_enable_pk_chunking.htm] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (GOBBLIN-865) Add feature that enables PK-chunking in partition
[ https://issues.apache.org/jira/browse/GOBBLIN-865?focusedWorklogId=316876=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-316876 ] ASF GitHub Bot logged work on GOBBLIN-865: -- Author: ASF GitHub Bot Created on: 23/Sep/19 17:58 Start Date: 23/Sep/19 17:58 Worklog Time Spent: 10m Work Description: arekusuri commented on pull request #2722: GOBBLIN-865: Add feature that enables PK-chunking in partition URL: https://github.com/apache/incubator-gobblin/pull/2722#discussion_r327250834 ## File path: gobblin-salesforce/src/main/java/org/apache/gobblin/salesforce/SalesforceExtractor.java ## @@ -598,6 +595,8 @@ public String getTimestampPredicateCondition(String column, long value, String v String[] batchIdResultIdArray = partitionPkChunkingBatchIdResultIterator.next().split(":"); String batchId = batchIdResultIdArray[0]; String resultId = batchIdResultIdArray[1]; +log.info(String.format("PK-Chunking work unit: fetching file for (jobId=%s, batchId=%s, resultId=%s) ", Review comment: thanks for online talk, fixed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 316876) Time Spent: 8h 10m (was: 8h) > Add feature that enables PK-chunking in partition > -- > > Key: GOBBLIN-865 > URL: https://issues.apache.org/jira/browse/GOBBLIN-865 > Project: Apache Gobblin > Issue Type: Task >Reporter: Alex Li >Priority: Major > Labels: salesforce > Time Spent: 8h 10m > Remaining Estimate: 0h > > In SFDC(salesforce) connector, we have partitioning mechanisms to split a > giant query to multiple sub queries. There are 3 mechanisms: > * simple partition (equally split by time) > * dynamic pre-partition (generate histogram and split by row numbers) > * user specified partition (set up time range in job file) > However there are tables like Task and Contract are failing time to time to > fetch full data. > We may want to utilize PK-chunking to partition the query. > > The pk-chunking doc from SFDC - > [https://developer.salesforce.com/docs/atlas.en-us.api_asynch.meta/api_asynch/async_api_headers_enable_pk_chunking.htm] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [incubator-gobblin] arekusuri commented on a change in pull request #2722: GOBBLIN-865: Add feature that enables PK-chunking in partition
arekusuri commented on a change in pull request #2722: GOBBLIN-865: Add feature that enables PK-chunking in partition URL: https://github.com/apache/incubator-gobblin/pull/2722#discussion_r327250904 ## File path: gobblin-salesforce/src/main/java/org/apache/gobblin/salesforce/SalesforceExtractor.java ## @@ -775,17 +775,15 @@ public SalesforceBulkJobId getQueryResultIdsPkChunking(String entity, List
[GitHub] [incubator-gobblin] arekusuri commented on a change in pull request #2722: GOBBLIN-865: Add feature that enables PK-chunking in partition
arekusuri commented on a change in pull request #2722: GOBBLIN-865: Add feature that enables PK-chunking in partition URL: https://github.com/apache/incubator-gobblin/pull/2722#discussion_r327250834 ## File path: gobblin-salesforce/src/main/java/org/apache/gobblin/salesforce/SalesforceExtractor.java ## @@ -598,6 +595,8 @@ public String getTimestampPredicateCondition(String column, long value, String v String[] batchIdResultIdArray = partitionPkChunkingBatchIdResultIterator.next().split(":"); String batchId = batchIdResultIdArray[0]; String resultId = batchIdResultIdArray[1]; +log.info(String.format("PK-Chunking work unit: fetching file for (jobId=%s, batchId=%s, resultId=%s) ", Review comment: thanks for online talk, fixed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Work logged] (GOBBLIN-865) Add feature that enables PK-chunking in partition
[ https://issues.apache.org/jira/browse/GOBBLIN-865?focusedWorklogId=316874=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-316874 ] ASF GitHub Bot logged work on GOBBLIN-865: -- Author: ASF GitHub Bot Created on: 23/Sep/19 17:51 Start Date: 23/Sep/19 17:51 Worklog Time Spent: 10m Work Description: arekusuri commented on pull request #2722: GOBBLIN-865: Add feature that enables PK-chunking in partition URL: https://github.com/apache/incubator-gobblin/pull/2722#discussion_r327247892 ## File path: gobblin-salesforce/src/main/java/org/apache/gobblin/salesforce/SalesforceExtractor.java ## @@ -598,6 +595,8 @@ public String getTimestampPredicateCondition(String column, long value, String v String[] batchIdResultIdArray = partitionPkChunkingBatchIdResultIterator.next().split(":"); String batchId = batchIdResultIdArray[0]; String resultId = batchIdResultIdArray[1]; +log.info(String.format("PK-Chunking work unit: fetching file for (jobId=%s, batchId=%s, resultId=%s) ", Review comment: BTW, we were using this way a lot though. I copied the code :) I did some searching in our code, I didn't find good example. Can you pint me out a sample code? Are you talking about `MessageFormat`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 316874) Time Spent: 8h (was: 7h 50m) > Add feature that enables PK-chunking in partition > -- > > Key: GOBBLIN-865 > URL: https://issues.apache.org/jira/browse/GOBBLIN-865 > Project: Apache Gobblin > Issue Type: Task >Reporter: Alex Li >Priority: Major > Labels: salesforce > Time Spent: 8h > Remaining Estimate: 0h > > In SFDC(salesforce) connector, we have partitioning mechanisms to split a > giant query to multiple sub queries. There are 3 mechanisms: > * simple partition (equally split by time) > * dynamic pre-partition (generate histogram and split by row numbers) > * user specified partition (set up time range in job file) > However there are tables like Task and Contract are failing time to time to > fetch full data. > We may want to utilize PK-chunking to partition the query. > > The pk-chunking doc from SFDC - > [https://developer.salesforce.com/docs/atlas.en-us.api_asynch.meta/api_asynch/async_api_headers_enable_pk_chunking.htm] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (GOBBLIN-865) Add feature that enables PK-chunking in partition
[ https://issues.apache.org/jira/browse/GOBBLIN-865?focusedWorklogId=316863=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-316863 ] ASF GitHub Bot logged work on GOBBLIN-865: -- Author: ASF GitHub Bot Created on: 23/Sep/19 17:40 Start Date: 23/Sep/19 17:40 Worklog Time Spent: 10m Work Description: arekusuri commented on pull request #2722: GOBBLIN-865: Add feature that enables PK-chunking in partition URL: https://github.com/apache/incubator-gobblin/pull/2722#discussion_r327243183 ## File path: gobblin-salesforce/src/main/java/org/apache/gobblin/salesforce/SalesforceExtractor.java ## @@ -857,7 +850,7 @@ public SalesforceBulkJobId getQueryResultIdsPkChunking(String entity, List Add feature that enables PK-chunking in partition > -- > > Key: GOBBLIN-865 > URL: https://issues.apache.org/jira/browse/GOBBLIN-865 > Project: Apache Gobblin > Issue Type: Task >Reporter: Alex Li >Priority: Major > Labels: salesforce > Time Spent: 7h 50m > Remaining Estimate: 0h > > In SFDC(salesforce) connector, we have partitioning mechanisms to split a > giant query to multiple sub queries. There are 3 mechanisms: > * simple partition (equally split by time) > * dynamic pre-partition (generate histogram and split by row numbers) > * user specified partition (set up time range in job file) > However there are tables like Task and Contract are failing time to time to > fetch full data. > We may want to utilize PK-chunking to partition the query. > > The pk-chunking doc from SFDC - > [https://developer.salesforce.com/docs/atlas.en-us.api_asynch.meta/api_asynch/async_api_headers_enable_pk_chunking.htm] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [incubator-gobblin] arekusuri commented on a change in pull request #2722: GOBBLIN-865: Add feature that enables PK-chunking in partition
arekusuri commented on a change in pull request #2722: GOBBLIN-865: Add feature that enables PK-chunking in partition URL: https://github.com/apache/incubator-gobblin/pull/2722#discussion_r327243183 ## File path: gobblin-salesforce/src/main/java/org/apache/gobblin/salesforce/SalesforceExtractor.java ## @@ -857,7 +850,7 @@ public SalesforceBulkJobId getQueryResultIdsPkChunking(String entity, List
[jira] [Work logged] (GOBBLIN-865) Add feature that enables PK-chunking in partition
[ https://issues.apache.org/jira/browse/GOBBLIN-865?focusedWorklogId=316858=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-316858 ] ASF GitHub Bot logged work on GOBBLIN-865: -- Author: ASF GitHub Bot Created on: 23/Sep/19 17:36 Start Date: 23/Sep/19 17:36 Worklog Time Spent: 10m Work Description: arekusuri commented on pull request #2722: GOBBLIN-865: Add feature that enables PK-chunking in partition URL: https://github.com/apache/incubator-gobblin/pull/2722#discussion_r327241495 ## File path: gobblin-salesforce/src/main/java/org/apache/gobblin/salesforce/SalesforceExtractor.java ## @@ -869,7 +862,7 @@ public SalesforceBulkJobId getQueryResultIdsPkChunking(String entity, List Add feature that enables PK-chunking in partition > -- > > Key: GOBBLIN-865 > URL: https://issues.apache.org/jira/browse/GOBBLIN-865 > Project: Apache Gobblin > Issue Type: Task >Reporter: Alex Li >Priority: Major > Labels: salesforce > Time Spent: 7h 40m > Remaining Estimate: 0h > > In SFDC(salesforce) connector, we have partitioning mechanisms to split a > giant query to multiple sub queries. There are 3 mechanisms: > * simple partition (equally split by time) > * dynamic pre-partition (generate histogram and split by row numbers) > * user specified partition (set up time range in job file) > However there are tables like Task and Contract are failing time to time to > fetch full data. > We may want to utilize PK-chunking to partition the query. > > The pk-chunking doc from SFDC - > [https://developer.salesforce.com/docs/atlas.en-us.api_asynch.meta/api_asynch/async_api_headers_enable_pk_chunking.htm] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [incubator-gobblin] arekusuri commented on a change in pull request #2722: GOBBLIN-865: Add feature that enables PK-chunking in partition
arekusuri commented on a change in pull request #2722: GOBBLIN-865: Add feature that enables PK-chunking in partition URL: https://github.com/apache/incubator-gobblin/pull/2722#discussion_r327241495 ## File path: gobblin-salesforce/src/main/java/org/apache/gobblin/salesforce/SalesforceExtractor.java ## @@ -869,7 +862,7 @@ public SalesforceBulkJobId getQueryResultIdsPkChunking(String entity, List
[jira] [Resolved] (GOBBLIN-885) Fix ORC-Compaction bug in type-casting
[ https://issues.apache.org/jira/browse/GOBBLIN-885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hung Tran resolved GOBBLIN-885. --- Fix Version/s: 0.15.0 Resolution: Fixed Issue resolved by pull request #2738 [https://github.com/apache/incubator-gobblin/pull/2738] > Fix ORC-Compaction bug in type-casting > -- > > Key: GOBBLIN-885 > URL: https://issues.apache.org/jira/browse/GOBBLIN-885 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Lei Sun >Priority: Major > Fix For: 0.15.0 > > Time Spent: 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (GOBBLIN-885) Fix ORC-Compaction bug in type-casting
[ https://issues.apache.org/jira/browse/GOBBLIN-885?focusedWorklogId=316838=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-316838 ] ASF GitHub Bot logged work on GOBBLIN-885: -- Author: ASF GitHub Bot Created on: 23/Sep/19 17:16 Start Date: 23/Sep/19 17:16 Worklog Time Spent: 10m Work Description: asfgit commented on pull request #2738: [GOBBLIN-885]Fix orc-Compaction bug in non-dedup mode and add unit-test URL: https://github.com/apache/incubator-gobblin/pull/2738 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 316838) Time Spent: 40m (was: 0.5h) > Fix ORC-Compaction bug in type-casting > -- > > Key: GOBBLIN-885 > URL: https://issues.apache.org/jira/browse/GOBBLIN-885 > Project: Apache Gobblin > Issue Type: Improvement >Reporter: Lei Sun >Priority: Major > Fix For: 0.15.0 > > Time Spent: 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)