[jira] [Work logged] (GOBBLIN-865) Add feature that enables PK-chunking in partition

2019-09-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-865?focusedWorklogId=317008=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317008
 ]

ASF GitHub Bot logged work on GOBBLIN-865:
--

Author: ASF GitHub Bot
Created on: 23/Sep/19 22:19
Start Date: 23/Sep/19 22:19
Worklog Time Spent: 10m 
  Work Description: codecov-io commented on issue #2722: GOBBLIN-865: Add 
feature that enables PK-chunking in partition
URL: 
https://github.com/apache/incubator-gobblin/pull/2722#issuecomment-531069100
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2722?src=pr=h1)
 Report
   > Merging 
[#2722](https://codecov.io/gh/apache/incubator-gobblin/pull/2722?src=pr=desc)
 into 
[master](https://codecov.io/gh/apache/incubator-gobblin/commit/9bf9a882427e98e7f4ef089c4ca1bde42f4b36a3?src=pr=desc)
 will **increase** coverage by `<.01%`.
   > The diff coverage is `1.64%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/graphs/tree.svg?width=650=4MgURJ0bGc=150=pr)](https://codecov.io/gh/apache/incubator-gobblin/pull/2722?src=pr=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#2722  +/-   ##
   
   + Coverage 45.04%   45.05%   +<.01% 
   - Complexity 8739 8780  +41 
   
 Files  1880 1886   +6 
 Lines 7020570642 +437 
 Branches   7707 7745  +38 
   
   + Hits  3162331826 +203 
   - Misses3565135870 +219 
   - Partials   2931 2946  +15
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-gobblin/pull/2722?src=pr=tree) 
| Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...obblin/salesforce/SalesforceConfigurationKeys.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/diff?src=pr=tree#diff-Z29iYmxpbi1zYWxlc2ZvcmNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NhbGVzZm9yY2UvU2FsZXNmb3JjZUNvbmZpZ3VyYXRpb25LZXlzLmphdmE=)
 | `0% <ø> (ø)` | `0 <0> (ø)` | :arrow_down: |
   | 
[...apache/gobblin/salesforce/SalesforceExtractor.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/diff?src=pr=tree#diff-Z29iYmxpbi1zYWxlc2ZvcmNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NhbGVzZm9yY2UvU2FsZXNmb3JjZUV4dHJhY3Rvci5qYXZh)
 | `0% <0%> (ø)` | `0 <0> (ø)` | :arrow_down: |
   | 
[...ce/extractor/extract/restapi/RestApiExtractor.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/diff?src=pr=tree#diff-Z29iYmxpbi1jb3JlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NvdXJjZS9leHRyYWN0b3IvZXh0cmFjdC9yZXN0YXBpL1Jlc3RBcGlFeHRyYWN0b3IuamF2YQ==)
 | `0% <0%> (ø)` | `0 <0> (ø)` | :arrow_down: |
   | 
[...rg/apache/gobblin/salesforce/SalesforceSource.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/diff?src=pr=tree#diff-Z29iYmxpbi1zYWxlc2ZvcmNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NhbGVzZm9yY2UvU2FsZXNmb3JjZVNvdXJjZS5qYXZh)
 | `19.74% <5.66%> (-3.02%)` | `12 <1> (+1)` | |
   | 
[...obblin/service/monitoring/FlowStatusGenerator.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/diff?src=pr=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NlcnZpY2UvbW9uaXRvcmluZy9GbG93U3RhdHVzR2VuZXJhdG9yLmphdmE=)
 | `82.14% <0%> (-7.15%)` | `11% <0%> (-1%)` | |
   | 
[...bblin/compaction/mapreduce/orc/OrcValueMapper.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/diff?src=pr=tree#diff-Z29iYmxpbi1jb21wYWN0aW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NvbXBhY3Rpb24vbWFwcmVkdWNlL29yYy9PcmNWYWx1ZU1hcHBlci5qYXZh)
 | `78.87% <0%> (-2.38%)` | `16% <0%> (+11%)` | |
   | 
[...apache/gobblin/runtime/local/LocalJobLauncher.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/diff?src=pr=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvbG9jYWwvTG9jYWxKb2JMYXVuY2hlci5qYXZh)
 | `61.81% <0%> (-2.34%)` | `5% <0%> (ø)` | |
   | 
[...ache/gobblin/couchbase/writer/CouchbaseWriter.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/diff?src=pr=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4tY291Y2hiYXNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NvdWNoYmFzZS93cml0ZXIvQ291Y2hiYXNlV3JpdGVyLmphdmE=)
 | `64.39% <0%> (-1.89%)` | `15% <0%> (+4%)` | |
   | 
[.../org/apache/gobblin/cluster/GobblinTaskRunner.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/diff?src=pr=tree#diff-Z29iYmxpbi1jbHVzdGVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NsdXN0ZXIvR29iYmxpblRhc2tSdW5uZXIuamF2YQ==)
 | `63.88% <0%> (-0.9%)` | `28% <0%> (-1%)` | |
   | 

[GitHub] [incubator-gobblin] codecov-io edited a comment on issue #2722: GOBBLIN-865: Add feature that enables PK-chunking in partition

2019-09-23 Thread GitBox
codecov-io edited a comment on issue #2722: GOBBLIN-865: Add feature that 
enables PK-chunking in partition
URL: 
https://github.com/apache/incubator-gobblin/pull/2722#issuecomment-531069100
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2722?src=pr=h1)
 Report
   > Merging 
[#2722](https://codecov.io/gh/apache/incubator-gobblin/pull/2722?src=pr=desc)
 into 
[master](https://codecov.io/gh/apache/incubator-gobblin/commit/9bf9a882427e98e7f4ef089c4ca1bde42f4b36a3?src=pr=desc)
 will **increase** coverage by `<.01%`.
   > The diff coverage is `1.64%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/graphs/tree.svg?width=650=4MgURJ0bGc=150=pr)](https://codecov.io/gh/apache/incubator-gobblin/pull/2722?src=pr=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#2722  +/-   ##
   
   + Coverage 45.04%   45.05%   +<.01% 
   - Complexity 8739 8780  +41 
   
 Files  1880 1886   +6 
 Lines 7020570642 +437 
 Branches   7707 7745  +38 
   
   + Hits  3162331826 +203 
   - Misses3565135870 +219 
   - Partials   2931 2946  +15
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-gobblin/pull/2722?src=pr=tree) 
| Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...obblin/salesforce/SalesforceConfigurationKeys.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/diff?src=pr=tree#diff-Z29iYmxpbi1zYWxlc2ZvcmNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NhbGVzZm9yY2UvU2FsZXNmb3JjZUNvbmZpZ3VyYXRpb25LZXlzLmphdmE=)
 | `0% <ø> (ø)` | `0 <0> (ø)` | :arrow_down: |
   | 
[...apache/gobblin/salesforce/SalesforceExtractor.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/diff?src=pr=tree#diff-Z29iYmxpbi1zYWxlc2ZvcmNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NhbGVzZm9yY2UvU2FsZXNmb3JjZUV4dHJhY3Rvci5qYXZh)
 | `0% <0%> (ø)` | `0 <0> (ø)` | :arrow_down: |
   | 
[...ce/extractor/extract/restapi/RestApiExtractor.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/diff?src=pr=tree#diff-Z29iYmxpbi1jb3JlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NvdXJjZS9leHRyYWN0b3IvZXh0cmFjdC9yZXN0YXBpL1Jlc3RBcGlFeHRyYWN0b3IuamF2YQ==)
 | `0% <0%> (ø)` | `0 <0> (ø)` | :arrow_down: |
   | 
[...rg/apache/gobblin/salesforce/SalesforceSource.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/diff?src=pr=tree#diff-Z29iYmxpbi1zYWxlc2ZvcmNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NhbGVzZm9yY2UvU2FsZXNmb3JjZVNvdXJjZS5qYXZh)
 | `19.74% <5.66%> (-3.02%)` | `12 <1> (+1)` | |
   | 
[...obblin/service/monitoring/FlowStatusGenerator.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/diff?src=pr=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NlcnZpY2UvbW9uaXRvcmluZy9GbG93U3RhdHVzR2VuZXJhdG9yLmphdmE=)
 | `82.14% <0%> (-7.15%)` | `11% <0%> (-1%)` | |
   | 
[...bblin/compaction/mapreduce/orc/OrcValueMapper.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/diff?src=pr=tree#diff-Z29iYmxpbi1jb21wYWN0aW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NvbXBhY3Rpb24vbWFwcmVkdWNlL29yYy9PcmNWYWx1ZU1hcHBlci5qYXZh)
 | `78.87% <0%> (-2.38%)` | `16% <0%> (+11%)` | |
   | 
[...apache/gobblin/runtime/local/LocalJobLauncher.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/diff?src=pr=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvbG9jYWwvTG9jYWxKb2JMYXVuY2hlci5qYXZh)
 | `61.81% <0%> (-2.34%)` | `5% <0%> (ø)` | |
   | 
[...ache/gobblin/couchbase/writer/CouchbaseWriter.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/diff?src=pr=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4tY291Y2hiYXNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NvdWNoYmFzZS93cml0ZXIvQ291Y2hiYXNlV3JpdGVyLmphdmE=)
 | `64.39% <0%> (-1.89%)` | `15% <0%> (+4%)` | |
   | 
[.../org/apache/gobblin/cluster/GobblinTaskRunner.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/diff?src=pr=tree#diff-Z29iYmxpbi1jbHVzdGVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NsdXN0ZXIvR29iYmxpblRhc2tSdW5uZXIuamF2YQ==)
 | `63.88% <0%> (-0.9%)` | `28% <0%> (-1%)` | |
   | 
[.../apache/gobblin/cluster/GobblinClusterManager.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/diff?src=pr=tree#diff-Z29iYmxpbi1jbHVzdGVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NsdXN0ZXIvR29iYmxpbkNsdXN0ZXJNYW5hZ2VyLmphdmE=)
 | `53.91% <0%> (-0.51%)` | `27% <0%> (ø)` | |
   | ... and [38 
more](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/diff?src=pr=tree-more)
 | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2722?src=pr=continue).
   > **Legend** - [Click here to 

[GitHub] [incubator-gobblin] arekusuri commented on a change in pull request #2722: GOBBLIN-865: Add feature that enables PK-chunking in partition

2019-09-23 Thread GitBox
arekusuri commented on a change in pull request #2722: GOBBLIN-865: Add feature 
that enables PK-chunking in partition
URL: https://github.com/apache/incubator-gobblin/pull/2722#discussion_r327343275
 
 

 ##
 File path: 
gobblin-salesforce/src/main/java/org/apache/gobblin/salesforce/SalesforceExtractor.java
 ##
 @@ -869,7 +862,7 @@ public SalesforceBulkJobId 
getQueryResultIdsPkChunking(String entity, List

[jira] [Work logged] (GOBBLIN-865) Add feature that enables PK-chunking in partition

2019-09-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-865?focusedWorklogId=316988=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-316988
 ]

ASF GitHub Bot logged work on GOBBLIN-865:
--

Author: ASF GitHub Bot
Created on: 23/Sep/19 21:41
Start Date: 23/Sep/19 21:41
Worklog Time Spent: 10m 
  Work Description: arekusuri commented on issue #2722: GOBBLIN-865: Add 
feature that enables PK-chunking in partition
URL: 
https://github.com/apache/incubator-gobblin/pull/2722#issuecomment-534296716
 
 
   Hi @htran1 and @zxcware Please hold on this pull request a little bit.
   We will try another approach to ensure the code is enough mature before we 
merge into opensource. 
   the doc is here - 
https://docs.google.com/document/d/1fJ7Gju9tXR8WBbwxct0_l21Ijhb4hZJykYFDhDcRLp4/edit#heading=h.37qi9whhekol
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 316988)
Time Spent: 8h 50m  (was: 8h 40m)

> Add feature that enables PK-chunking in partition 
> --
>
> Key: GOBBLIN-865
> URL: https://issues.apache.org/jira/browse/GOBBLIN-865
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Alex Li
>Priority: Major
>  Labels: salesforce
>  Time Spent: 8h 50m
>  Remaining Estimate: 0h
>
> In SFDC(salesforce) connector, we have partitioning mechanisms to split a 
> giant query to multiple sub queries. There are 3 mechanisms:
>  * simple partition (equally split by time)
>  * dynamic pre-partition (generate histogram and split by row numbers)
>  * user specified partition (set up time range in job file)
> However there are tables like Task and Contract are failing time to time to 
> fetch full data.
> We may want to utilize PK-chunking to partition the query.
>  
> The pk-chunking doc from SFDC - 
> [https://developer.salesforce.com/docs/atlas.en-us.api_asynch.meta/api_asynch/async_api_headers_enable_pk_chunking.htm]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-gobblin] arekusuri commented on issue #2722: GOBBLIN-865: Add feature that enables PK-chunking in partition

2019-09-23 Thread GitBox
arekusuri commented on issue #2722: GOBBLIN-865: Add feature that enables 
PK-chunking in partition
URL: 
https://github.com/apache/incubator-gobblin/pull/2722#issuecomment-534296716
 
 
   Hi @htran1 and @zxcware Please hold on this pull request a little bit.
   We will try another approach to ensure the code is enough mature before we 
merge into opensource. 
   the doc is here - 
https://docs.google.com/document/d/1fJ7Gju9tXR8WBbwxct0_l21Ijhb4hZJykYFDhDcRLp4/edit#heading=h.37qi9whhekol


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Work logged] (GOBBLIN-881) Add job tag field that can be used to filter job statuses

2019-09-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-881?focusedWorklogId=316973=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-316973
 ]

ASF GitHub Bot logged work on GOBBLIN-881:
--

Author: ASF GitHub Bot
Created on: 23/Sep/19 20:56
Start Date: 23/Sep/19 20:56
Worklog Time Spent: 10m 
  Work Description: arjun4084346 commented on issue #2735: [GOBBLIN-881] 
Add job tag field that can be used to filter job statuses
URL: 
https://github.com/apache/incubator-gobblin/pull/2735#issuecomment-534281157
 
 
   +1 LGTM.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 316973)
Time Spent: 40m  (was: 0.5h)

> Add job tag field that can be used to filter job statuses
> -
>
> Key: GOBBLIN-881
> URL: https://issues.apache.org/jira/browse/GOBBLIN-881
> Project: Apache Gobblin
>  Issue Type: Bug
>Reporter: Jack Moseley
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-gobblin] arjun4084346 commented on issue #2735: [GOBBLIN-881] Add job tag field that can be used to filter job statuses

2019-09-23 Thread GitBox
arjun4084346 commented on issue #2735: [GOBBLIN-881] Add job tag field that can 
be used to filter job statuses
URL: 
https://github.com/apache/incubator-gobblin/pull/2735#issuecomment-534281157
 
 
   +1 LGTM.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Work logged] (GOBBLIN-865) Add feature that enables PK-chunking in partition

2019-09-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-865?focusedWorklogId=316920=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-316920
 ]

ASF GitHub Bot logged work on GOBBLIN-865:
--

Author: ASF GitHub Bot
Created on: 23/Sep/19 18:54
Start Date: 23/Sep/19 18:54
Worklog Time Spent: 10m 
  Work Description: arekusuri commented on pull request #2722: GOBBLIN-865: 
Add feature that enables PK-chunking in partition
URL: https://github.com/apache/incubator-gobblin/pull/2722#discussion_r327275396
 
 

 ##
 File path: 
gobblin-salesforce/src/main/java/org/apache/gobblin/salesforce/SalesforceExtractor.java
 ##
 @@ -598,6 +595,8 @@ public String getTimestampPredicateCondition(String 
column, long value, String v
 String[] batchIdResultIdArray = 
partitionPkChunkingBatchIdResultIterator.next().split(":");
 String batchId = batchIdResultIdArray[0];
 String resultId = batchIdResultIdArray[1];
+log.info(String.format("PK-Chunking work unit: fetching file for 
(jobId=%s, batchId=%s, resultId=%s) ",
 
 Review comment:
   thanks for offline talk, fixed. :)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 316920)
Time Spent: 8h 40m  (was: 8.5h)

> Add feature that enables PK-chunking in partition 
> --
>
> Key: GOBBLIN-865
> URL: https://issues.apache.org/jira/browse/GOBBLIN-865
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Alex Li
>Priority: Major
>  Labels: salesforce
>  Time Spent: 8h 40m
>  Remaining Estimate: 0h
>
> In SFDC(salesforce) connector, we have partitioning mechanisms to split a 
> giant query to multiple sub queries. There are 3 mechanisms:
>  * simple partition (equally split by time)
>  * dynamic pre-partition (generate histogram and split by row numbers)
>  * user specified partition (set up time range in job file)
> However there are tables like Task and Contract are failing time to time to 
> fetch full data.
> We may want to utilize PK-chunking to partition the query.
>  
> The pk-chunking doc from SFDC - 
> [https://developer.salesforce.com/docs/atlas.en-us.api_asynch.meta/api_asynch/async_api_headers_enable_pk_chunking.htm]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-gobblin] arekusuri commented on a change in pull request #2722: GOBBLIN-865: Add feature that enables PK-chunking in partition

2019-09-23 Thread GitBox
arekusuri commented on a change in pull request #2722: GOBBLIN-865: Add feature 
that enables PK-chunking in partition
URL: https://github.com/apache/incubator-gobblin/pull/2722#discussion_r327275396
 
 

 ##
 File path: 
gobblin-salesforce/src/main/java/org/apache/gobblin/salesforce/SalesforceExtractor.java
 ##
 @@ -598,6 +595,8 @@ public String getTimestampPredicateCondition(String 
column, long value, String v
 String[] batchIdResultIdArray = 
partitionPkChunkingBatchIdResultIterator.next().split(":");
 String batchId = batchIdResultIdArray[0];
 String resultId = batchIdResultIdArray[1];
+log.info(String.format("PK-Chunking work unit: fetching file for 
(jobId=%s, batchId=%s, resultId=%s) ",
 
 Review comment:
   thanks for offline talk, fixed. :)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Work logged] (GOBBLIN-853) Support multiple paths specified in flow config

2019-09-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-853?focusedWorklogId=316885=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-316885
 ]

ASF GitHub Bot logged work on GOBBLIN-853:
--

Author: ASF GitHub Bot
Created on: 23/Sep/19 18:05
Start Date: 23/Sep/19 18:05
Worklog Time Spent: 10m 
  Work Description: sv2000 commented on pull request #2709: [GOBBLIN-853] 
Support multiple paths specified in flow config
URL: https://github.com/apache/incubator-gobblin/pull/2709#discussion_r327250309
 
 

 ##
 File path: 
gobblin-service/src/main/java/org/apache/gobblin/service/modules/flow/MultiHopFlowCompiler.java
 ##
 @@ -197,6 +210,87 @@ public void awaitHealthy() throws InterruptedException {
 return jobExecutionPlanDag;
   }
 
+  /**
+   * If {@link FlowSpec} has {@link #DATASET_SUBPATHS_KEY}, split it into 
multiple flowSpecs using a provided base input
+   * and base output path to generate multiple source/destination paths.
+   */
+  private static List splitFlowSpec(FlowSpec flowSpec) {
+long flowExecutionId = FlowUtils.getOrCreateFlowExecutionId(flowSpec);
+List flowSpecs = new ArrayList<>();
+
+if (flowSpec.getConfig().hasPath(DATASET_SUBPATHS_KEY)) {
+  List datasetSubpaths = 
ConfigUtils.getStringList(flowSpec.getConfig(), DATASET_SUBPATHS_KEY);
+  String baseInputPath = ConfigUtils.getString(flowSpec.getConfig(), 
DATASET_BASE_INPUT_PATH_KEY, "/");
+  String baseOutputPath = ConfigUtils.getString(flowSpec.getConfig(), 
DATASET_BASE_OUTPUT_PATH_KEY, "/");
+
+  for (String subPath : datasetSubpaths) {
+Config newConfig = flowSpec.getConfig().withoutPath("dataset.subPaths")
+.withValue(ConfigurationKeys.FLOW_EXECUTION_ID_KEY, 
ConfigValueFactory.fromAnyRef(flowExecutionId))
+
.withValue(DatasetDescriptorConfigKeys.FLOW_INPUT_DATASET_DESCRIPTOR_PREFIX + 
"." + DatasetDescriptorConfigKeys.PATH_KEY,
+ConfigValueFactory.fromAnyRef(new Path(baseInputPath, 
subPath).toString()))
+
.withValue(DatasetDescriptorConfigKeys.FLOW_OUTPUT_DATASET_DESCRIPTOR_PREFIX + 
"." + DatasetDescriptorConfigKeys.PATH_KEY,
+ConfigValueFactory.fromAnyRef(new Path(baseOutputPath, 
subPath).toString()));
+flowSpecs.add(copyFlowSpecWithNewConfig(flowSpec, newConfig));
+  }
+} else {
+  return splitFlowSpecByNumber(flowSpec);
+}
+
+return flowSpecs;
+  }
+
+  /**
+   * If {@link FlowSpec} has config keys like configKey.0, configKey.1, split 
it into multiple flowSpecs on these properties.
+   * Properties that do not specify numbers will be present in all returned 
flowSpecs.
+   */
+  private static List splitFlowSpecByNumber(FlowSpec flowSpec) {
 
 Review comment:
   We probably should revisit this later. IIUC, it looks like there are 2 
possible ways to specify multi-dataset flows, and the base-path/sub-path is a 
special case of splitting flowspecs by number. Better to focus on the common 
use case first i.e. the one where we have a common base path with different 
sub-paths.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 316885)
Time Spent: 1h  (was: 50m)

> Support multiple paths specified in flow config
> ---
>
> Key: GOBBLIN-853
> URL: https://issues.apache.org/jira/browse/GOBBLIN-853
> Project: Apache Gobblin
>  Issue Type: Bug
>Reporter: Jack Moseley
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (GOBBLIN-853) Support multiple paths specified in flow config

2019-09-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-853?focusedWorklogId=316888=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-316888
 ]

ASF GitHub Bot logged work on GOBBLIN-853:
--

Author: ASF GitHub Bot
Created on: 23/Sep/19 18:05
Start Date: 23/Sep/19 18:05
Worklog Time Spent: 10m 
  Work Description: sv2000 commented on pull request #2709: [GOBBLIN-853] 
Support multiple paths specified in flow config
URL: https://github.com/apache/incubator-gobblin/pull/2709#discussion_r327253366
 
 

 ##
 File path: 
gobblin-service/src/main/java/org/apache/gobblin/service/modules/flow/MultiHopFlowCompiler.java
 ##
 @@ -66,6 +70,10 @@
 @Alpha
 @Slf4j
 public class MultiHopFlowCompiler extends BaseFlowToJobSpecCompiler {
+  private static final String DATASET_SUBPATHS_KEY = "dataset.subPaths";
+  private static final String DATASET_BASE_INPUT_PATH_KEY = 
"dataset.baseInputPath";
+  private static final String DATASET_BASE_OUTPUT_PATH_KEY = 
"dataset.baseOutputPath";
 
 Review comment:
   Same comment as above.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 316888)
Time Spent: 1h  (was: 50m)

> Support multiple paths specified in flow config
> ---
>
> Key: GOBBLIN-853
> URL: https://issues.apache.org/jira/browse/GOBBLIN-853
> Project: Apache Gobblin
>  Issue Type: Bug
>Reporter: Jack Moseley
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (GOBBLIN-853) Support multiple paths specified in flow config

2019-09-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-853?focusedWorklogId=316886=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-316886
 ]

ASF GitHub Bot logged work on GOBBLIN-853:
--

Author: ASF GitHub Bot
Created on: 23/Sep/19 18:05
Start Date: 23/Sep/19 18:05
Worklog Time Spent: 10m 
  Work Description: sv2000 commented on pull request #2709: [GOBBLIN-853] 
Support multiple paths specified in flow config
URL: https://github.com/apache/incubator-gobblin/pull/2709#discussion_r327253294
 
 

 ##
 File path: 
gobblin-service/src/main/java/org/apache/gobblin/service/modules/flow/MultiHopFlowCompiler.java
 ##
 @@ -66,6 +70,10 @@
 @Alpha
 @Slf4j
 public class MultiHopFlowCompiler extends BaseFlowToJobSpecCompiler {
+  private static final String DATASET_SUBPATHS_KEY = "dataset.subPaths";
+  private static final String DATASET_BASE_INPUT_PATH_KEY = 
"dataset.baseInputPath";
 
 Review comment:
   Same comment as above.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 316886)
Time Spent: 1h  (was: 50m)

> Support multiple paths specified in flow config
> ---
>
> Key: GOBBLIN-853
> URL: https://issues.apache.org/jira/browse/GOBBLIN-853
> Project: Apache Gobblin
>  Issue Type: Bug
>Reporter: Jack Moseley
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (GOBBLIN-853) Support multiple paths specified in flow config

2019-09-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-853?focusedWorklogId=316884=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-316884
 ]

ASF GitHub Bot logged work on GOBBLIN-853:
--

Author: ASF GitHub Bot
Created on: 23/Sep/19 18:05
Start Date: 23/Sep/19 18:05
Worklog Time Spent: 10m 
  Work Description: sv2000 commented on pull request #2709: [GOBBLIN-853] 
Support multiple paths specified in flow config
URL: https://github.com/apache/incubator-gobblin/pull/2709#discussion_r327245999
 
 

 ##
 File path: 
gobblin-service/src/main/java/org/apache/gobblin/service/modules/spec/JobExecutionPlan.java
 ##
 @@ -90,8 +91,10 @@ private static JobSpec buildJobSpec(FlowSpec flowSpec, 
Config jobConfig, Long fl
   String jobName = ConfigUtils.getString(jobConfig, 
ConfigurationKeys.JOB_NAME_KEY, "");
   String edgeId = ConfigUtils.getString(jobConfig, 
FlowGraphConfigurationKeys.FLOW_EDGE_ID_KEY, "");
 
-  //Modify the job name to include the flow group, flow name and edge id.
-  jobName = Joiner.on(JOB_NAME_COMPONENT_SEPARATION_CHAR).join(flowGroup, 
flowName, jobName, edgeId);
+  // Modify the job name to include the flow group, flow name, edge id, 
and a random string to avoid collisions since
+  // job names are assumed to be unique within a dag.
+  jobName = Joiner.on(JOB_NAME_COMPONENT_SEPARATION_CHAR).join(flowGroup, 
flowName, jobName, edgeId,
 
 Review comment:
   This may be problematic for jobs with state store enabled, as the job name 
is used for store name.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 316884)
Time Spent: 50m  (was: 40m)

> Support multiple paths specified in flow config
> ---
>
> Key: GOBBLIN-853
> URL: https://issues.apache.org/jira/browse/GOBBLIN-853
> Project: Apache Gobblin
>  Issue Type: Bug
>Reporter: Jack Moseley
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-gobblin] sv2000 commented on a change in pull request #2709: [GOBBLIN-853] Support multiple paths specified in flow config

2019-09-23 Thread GitBox
sv2000 commented on a change in pull request #2709: [GOBBLIN-853] Support 
multiple paths specified in flow config
URL: https://github.com/apache/incubator-gobblin/pull/2709#discussion_r327253294
 
 

 ##
 File path: 
gobblin-service/src/main/java/org/apache/gobblin/service/modules/flow/MultiHopFlowCompiler.java
 ##
 @@ -66,6 +70,10 @@
 @Alpha
 @Slf4j
 public class MultiHopFlowCompiler extends BaseFlowToJobSpecCompiler {
+  private static final String DATASET_SUBPATHS_KEY = "dataset.subPaths";
+  private static final String DATASET_BASE_INPUT_PATH_KEY = 
"dataset.baseInputPath";
 
 Review comment:
   Same comment as above.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-gobblin] sv2000 commented on a change in pull request #2709: [GOBBLIN-853] Support multiple paths specified in flow config

2019-09-23 Thread GitBox
sv2000 commented on a change in pull request #2709: [GOBBLIN-853] Support 
multiple paths specified in flow config
URL: https://github.com/apache/incubator-gobblin/pull/2709#discussion_r327253186
 
 

 ##
 File path: 
gobblin-service/src/main/java/org/apache/gobblin/service/modules/flow/MultiHopFlowCompiler.java
 ##
 @@ -66,6 +70,10 @@
 @Alpha
 @Slf4j
 public class MultiHopFlowCompiler extends BaseFlowToJobSpecCompiler {
+  private static final String DATASET_SUBPATHS_KEY = "dataset.subPaths";
 
 Review comment:
   Should this property be moved to ConfigurationKeys and renamed as 
gobblin.flow.dataset.subPaths?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-gobblin] sv2000 commented on a change in pull request #2709: [GOBBLIN-853] Support multiple paths specified in flow config

2019-09-23 Thread GitBox
sv2000 commented on a change in pull request #2709: [GOBBLIN-853] Support 
multiple paths specified in flow config
URL: https://github.com/apache/incubator-gobblin/pull/2709#discussion_r327245999
 
 

 ##
 File path: 
gobblin-service/src/main/java/org/apache/gobblin/service/modules/spec/JobExecutionPlan.java
 ##
 @@ -90,8 +91,10 @@ private static JobSpec buildJobSpec(FlowSpec flowSpec, 
Config jobConfig, Long fl
   String jobName = ConfigUtils.getString(jobConfig, 
ConfigurationKeys.JOB_NAME_KEY, "");
   String edgeId = ConfigUtils.getString(jobConfig, 
FlowGraphConfigurationKeys.FLOW_EDGE_ID_KEY, "");
 
-  //Modify the job name to include the flow group, flow name and edge id.
-  jobName = Joiner.on(JOB_NAME_COMPONENT_SEPARATION_CHAR).join(flowGroup, 
flowName, jobName, edgeId);
+  // Modify the job name to include the flow group, flow name, edge id, 
and a random string to avoid collisions since
+  // job names are assumed to be unique within a dag.
+  jobName = Joiner.on(JOB_NAME_COMPONENT_SEPARATION_CHAR).join(flowGroup, 
flowName, jobName, edgeId,
 
 Review comment:
   This may be problematic for jobs with state store enabled, as the job name 
is used for store name.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-gobblin] sv2000 commented on a change in pull request #2709: [GOBBLIN-853] Support multiple paths specified in flow config

2019-09-23 Thread GitBox
sv2000 commented on a change in pull request #2709: [GOBBLIN-853] Support 
multiple paths specified in flow config
URL: https://github.com/apache/incubator-gobblin/pull/2709#discussion_r327253366
 
 

 ##
 File path: 
gobblin-service/src/main/java/org/apache/gobblin/service/modules/flow/MultiHopFlowCompiler.java
 ##
 @@ -66,6 +70,10 @@
 @Alpha
 @Slf4j
 public class MultiHopFlowCompiler extends BaseFlowToJobSpecCompiler {
+  private static final String DATASET_SUBPATHS_KEY = "dataset.subPaths";
+  private static final String DATASET_BASE_INPUT_PATH_KEY = 
"dataset.baseInputPath";
+  private static final String DATASET_BASE_OUTPUT_PATH_KEY = 
"dataset.baseOutputPath";
 
 Review comment:
   Same comment as above.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Work logged] (GOBBLIN-865) Add feature that enables PK-chunking in partition

2019-09-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-865?focusedWorklogId=316878=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-316878
 ]

ASF GitHub Bot logged work on GOBBLIN-865:
--

Author: ASF GitHub Bot
Created on: 23/Sep/19 17:59
Start Date: 23/Sep/19 17:59
Worklog Time Spent: 10m 
  Work Description: arekusuri commented on pull request #2722: GOBBLIN-865: 
Add feature that enables PK-chunking in partition
URL: https://github.com/apache/incubator-gobblin/pull/2722#discussion_r327251198
 
 

 ##
 File path: 
gobblin-salesforce/src/main/java/org/apache/gobblin/salesforce/SalesforceExtractor.java
 ##
 @@ -1130,35 +1120,28 @@ public void closeConnection() throws Exception {
 
   /**
* Waits for the PK batches to complete. The wait will stop after all 
batches are complete or on the first failed batch
-   * @param batchInfoList list of batch info
-   * @param waitInterval the polling interval
-   * @return the last {@link BatchInfo} processed
-   * @throws InterruptedException
-   * @throws AsyncApiException
*/
-  private BatchInfo waitForPkBatches(BatchInfoList batchInfoList, int 
waitInterval)
-  throws InterruptedException, AsyncApiException {
-BatchInfo batchInfo = null;
+  private void waitForPkBatches(String jobId, BatchInfoList batchInfoList, int 
waitInterval)  {
+long toWait = (long)waitInterval * 1000;
 BatchInfo[] batchInfos = batchInfoList.getBatchInfo();
-
+log.info(String.format("Waiting for bulk (jobId=%s)", jobId));
 
 Review comment:
   fixed. thanks!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 316878)
Time Spent: 8.5h  (was: 8h 20m)

> Add feature that enables PK-chunking in partition 
> --
>
> Key: GOBBLIN-865
> URL: https://issues.apache.org/jira/browse/GOBBLIN-865
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Alex Li
>Priority: Major
>  Labels: salesforce
>  Time Spent: 8.5h
>  Remaining Estimate: 0h
>
> In SFDC(salesforce) connector, we have partitioning mechanisms to split a 
> giant query to multiple sub queries. There are 3 mechanisms:
>  * simple partition (equally split by time)
>  * dynamic pre-partition (generate histogram and split by row numbers)
>  * user specified partition (set up time range in job file)
> However there are tables like Task and Contract are failing time to time to 
> fetch full data.
> We may want to utilize PK-chunking to partition the query.
>  
> The pk-chunking doc from SFDC - 
> [https://developer.salesforce.com/docs/atlas.en-us.api_asynch.meta/api_asynch/async_api_headers_enable_pk_chunking.htm]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (GOBBLIN-865) Add feature that enables PK-chunking in partition

2019-09-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-865?focusedWorklogId=316876=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-316876
 ]

ASF GitHub Bot logged work on GOBBLIN-865:
--

Author: ASF GitHub Bot
Created on: 23/Sep/19 17:58
Start Date: 23/Sep/19 17:58
Worklog Time Spent: 10m 
  Work Description: arekusuri commented on pull request #2722: GOBBLIN-865: 
Add feature that enables PK-chunking in partition
URL: https://github.com/apache/incubator-gobblin/pull/2722#discussion_r327250834
 
 

 ##
 File path: 
gobblin-salesforce/src/main/java/org/apache/gobblin/salesforce/SalesforceExtractor.java
 ##
 @@ -598,6 +595,8 @@ public String getTimestampPredicateCondition(String 
column, long value, String v
 String[] batchIdResultIdArray = 
partitionPkChunkingBatchIdResultIterator.next().split(":");
 String batchId = batchIdResultIdArray[0];
 String resultId = batchIdResultIdArray[1];
+log.info(String.format("PK-Chunking work unit: fetching file for 
(jobId=%s, batchId=%s, resultId=%s) ",
 
 Review comment:
   thanks for online talk, fixed.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 316876)
Time Spent: 8h 10m  (was: 8h)

> Add feature that enables PK-chunking in partition 
> --
>
> Key: GOBBLIN-865
> URL: https://issues.apache.org/jira/browse/GOBBLIN-865
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Alex Li
>Priority: Major
>  Labels: salesforce
>  Time Spent: 8h 10m
>  Remaining Estimate: 0h
>
> In SFDC(salesforce) connector, we have partitioning mechanisms to split a 
> giant query to multiple sub queries. There are 3 mechanisms:
>  * simple partition (equally split by time)
>  * dynamic pre-partition (generate histogram and split by row numbers)
>  * user specified partition (set up time range in job file)
> However there are tables like Task and Contract are failing time to time to 
> fetch full data.
> We may want to utilize PK-chunking to partition the query.
>  
> The pk-chunking doc from SFDC - 
> [https://developer.salesforce.com/docs/atlas.en-us.api_asynch.meta/api_asynch/async_api_headers_enable_pk_chunking.htm]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-gobblin] arekusuri commented on a change in pull request #2722: GOBBLIN-865: Add feature that enables PK-chunking in partition

2019-09-23 Thread GitBox
arekusuri commented on a change in pull request #2722: GOBBLIN-865: Add feature 
that enables PK-chunking in partition
URL: https://github.com/apache/incubator-gobblin/pull/2722#discussion_r327250904
 
 

 ##
 File path: 
gobblin-salesforce/src/main/java/org/apache/gobblin/salesforce/SalesforceExtractor.java
 ##
 @@ -775,17 +775,15 @@ public SalesforceBulkJobId 
getQueryResultIdsPkChunking(String entity, List

[GitHub] [incubator-gobblin] arekusuri commented on a change in pull request #2722: GOBBLIN-865: Add feature that enables PK-chunking in partition

2019-09-23 Thread GitBox
arekusuri commented on a change in pull request #2722: GOBBLIN-865: Add feature 
that enables PK-chunking in partition
URL: https://github.com/apache/incubator-gobblin/pull/2722#discussion_r327250834
 
 

 ##
 File path: 
gobblin-salesforce/src/main/java/org/apache/gobblin/salesforce/SalesforceExtractor.java
 ##
 @@ -598,6 +595,8 @@ public String getTimestampPredicateCondition(String 
column, long value, String v
 String[] batchIdResultIdArray = 
partitionPkChunkingBatchIdResultIterator.next().split(":");
 String batchId = batchIdResultIdArray[0];
 String resultId = batchIdResultIdArray[1];
+log.info(String.format("PK-Chunking work unit: fetching file for 
(jobId=%s, batchId=%s, resultId=%s) ",
 
 Review comment:
   thanks for online talk, fixed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Work logged] (GOBBLIN-865) Add feature that enables PK-chunking in partition

2019-09-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-865?focusedWorklogId=316874=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-316874
 ]

ASF GitHub Bot logged work on GOBBLIN-865:
--

Author: ASF GitHub Bot
Created on: 23/Sep/19 17:51
Start Date: 23/Sep/19 17:51
Worklog Time Spent: 10m 
  Work Description: arekusuri commented on pull request #2722: GOBBLIN-865: 
Add feature that enables PK-chunking in partition
URL: https://github.com/apache/incubator-gobblin/pull/2722#discussion_r327247892
 
 

 ##
 File path: 
gobblin-salesforce/src/main/java/org/apache/gobblin/salesforce/SalesforceExtractor.java
 ##
 @@ -598,6 +595,8 @@ public String getTimestampPredicateCondition(String 
column, long value, String v
 String[] batchIdResultIdArray = 
partitionPkChunkingBatchIdResultIterator.next().split(":");
 String batchId = batchIdResultIdArray[0];
 String resultId = batchIdResultIdArray[1];
+log.info(String.format("PK-Chunking work unit: fetching file for 
(jobId=%s, batchId=%s, resultId=%s) ",
 
 Review comment:
   BTW, we were using this way a lot though. I copied the code :)
   I did some searching in our code, I didn't find good example. Can you pint 
me out a sample code? Are you talking about `MessageFormat`? 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 316874)
Time Spent: 8h  (was: 7h 50m)

> Add feature that enables PK-chunking in partition 
> --
>
> Key: GOBBLIN-865
> URL: https://issues.apache.org/jira/browse/GOBBLIN-865
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Alex Li
>Priority: Major
>  Labels: salesforce
>  Time Spent: 8h
>  Remaining Estimate: 0h
>
> In SFDC(salesforce) connector, we have partitioning mechanisms to split a 
> giant query to multiple sub queries. There are 3 mechanisms:
>  * simple partition (equally split by time)
>  * dynamic pre-partition (generate histogram and split by row numbers)
>  * user specified partition (set up time range in job file)
> However there are tables like Task and Contract are failing time to time to 
> fetch full data.
> We may want to utilize PK-chunking to partition the query.
>  
> The pk-chunking doc from SFDC - 
> [https://developer.salesforce.com/docs/atlas.en-us.api_asynch.meta/api_asynch/async_api_headers_enable_pk_chunking.htm]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (GOBBLIN-865) Add feature that enables PK-chunking in partition

2019-09-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-865?focusedWorklogId=316863=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-316863
 ]

ASF GitHub Bot logged work on GOBBLIN-865:
--

Author: ASF GitHub Bot
Created on: 23/Sep/19 17:40
Start Date: 23/Sep/19 17:40
Worklog Time Spent: 10m 
  Work Description: arekusuri commented on pull request #2722: GOBBLIN-865: 
Add feature that enables PK-chunking in partition
URL: https://github.com/apache/incubator-gobblin/pull/2722#discussion_r327243183
 
 

 ##
 File path: 
gobblin-salesforce/src/main/java/org/apache/gobblin/salesforce/SalesforceExtractor.java
 ##
 @@ -857,7 +850,7 @@ public SalesforceBulkJobId 
getQueryResultIdsPkChunking(String entity, List Add feature that enables PK-chunking in partition 
> --
>
> Key: GOBBLIN-865
> URL: https://issues.apache.org/jira/browse/GOBBLIN-865
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Alex Li
>Priority: Major
>  Labels: salesforce
>  Time Spent: 7h 50m
>  Remaining Estimate: 0h
>
> In SFDC(salesforce) connector, we have partitioning mechanisms to split a 
> giant query to multiple sub queries. There are 3 mechanisms:
>  * simple partition (equally split by time)
>  * dynamic pre-partition (generate histogram and split by row numbers)
>  * user specified partition (set up time range in job file)
> However there are tables like Task and Contract are failing time to time to 
> fetch full data.
> We may want to utilize PK-chunking to partition the query.
>  
> The pk-chunking doc from SFDC - 
> [https://developer.salesforce.com/docs/atlas.en-us.api_asynch.meta/api_asynch/async_api_headers_enable_pk_chunking.htm]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-gobblin] arekusuri commented on a change in pull request #2722: GOBBLIN-865: Add feature that enables PK-chunking in partition

2019-09-23 Thread GitBox
arekusuri commented on a change in pull request #2722: GOBBLIN-865: Add feature 
that enables PK-chunking in partition
URL: https://github.com/apache/incubator-gobblin/pull/2722#discussion_r327243183
 
 

 ##
 File path: 
gobblin-salesforce/src/main/java/org/apache/gobblin/salesforce/SalesforceExtractor.java
 ##
 @@ -857,7 +850,7 @@ public SalesforceBulkJobId 
getQueryResultIdsPkChunking(String entity, List

[jira] [Work logged] (GOBBLIN-865) Add feature that enables PK-chunking in partition

2019-09-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-865?focusedWorklogId=316858=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-316858
 ]

ASF GitHub Bot logged work on GOBBLIN-865:
--

Author: ASF GitHub Bot
Created on: 23/Sep/19 17:36
Start Date: 23/Sep/19 17:36
Worklog Time Spent: 10m 
  Work Description: arekusuri commented on pull request #2722: GOBBLIN-865: 
Add feature that enables PK-chunking in partition
URL: https://github.com/apache/incubator-gobblin/pull/2722#discussion_r327241495
 
 

 ##
 File path: 
gobblin-salesforce/src/main/java/org/apache/gobblin/salesforce/SalesforceExtractor.java
 ##
 @@ -869,7 +862,7 @@ public SalesforceBulkJobId 
getQueryResultIdsPkChunking(String entity, List Add feature that enables PK-chunking in partition 
> --
>
> Key: GOBBLIN-865
> URL: https://issues.apache.org/jira/browse/GOBBLIN-865
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Alex Li
>Priority: Major
>  Labels: salesforce
>  Time Spent: 7h 40m
>  Remaining Estimate: 0h
>
> In SFDC(salesforce) connector, we have partitioning mechanisms to split a 
> giant query to multiple sub queries. There are 3 mechanisms:
>  * simple partition (equally split by time)
>  * dynamic pre-partition (generate histogram and split by row numbers)
>  * user specified partition (set up time range in job file)
> However there are tables like Task and Contract are failing time to time to 
> fetch full data.
> We may want to utilize PK-chunking to partition the query.
>  
> The pk-chunking doc from SFDC - 
> [https://developer.salesforce.com/docs/atlas.en-us.api_asynch.meta/api_asynch/async_api_headers_enable_pk_chunking.htm]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-gobblin] arekusuri commented on a change in pull request #2722: GOBBLIN-865: Add feature that enables PK-chunking in partition

2019-09-23 Thread GitBox
arekusuri commented on a change in pull request #2722: GOBBLIN-865: Add feature 
that enables PK-chunking in partition
URL: https://github.com/apache/incubator-gobblin/pull/2722#discussion_r327241495
 
 

 ##
 File path: 
gobblin-salesforce/src/main/java/org/apache/gobblin/salesforce/SalesforceExtractor.java
 ##
 @@ -869,7 +862,7 @@ public SalesforceBulkJobId 
getQueryResultIdsPkChunking(String entity, List

[jira] [Resolved] (GOBBLIN-885) Fix ORC-Compaction bug in type-casting

2019-09-23 Thread Hung Tran (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-885.
---
Fix Version/s: 0.15.0
   Resolution: Fixed

Issue resolved by pull request #2738
[https://github.com/apache/incubator-gobblin/pull/2738]

> Fix ORC-Compaction bug in type-casting
> --
>
> Key: GOBBLIN-885
> URL: https://issues.apache.org/jira/browse/GOBBLIN-885
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Lei Sun
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (GOBBLIN-885) Fix ORC-Compaction bug in type-casting

2019-09-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-885?focusedWorklogId=316838=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-316838
 ]

ASF GitHub Bot logged work on GOBBLIN-885:
--

Author: ASF GitHub Bot
Created on: 23/Sep/19 17:16
Start Date: 23/Sep/19 17:16
Worklog Time Spent: 10m 
  Work Description: asfgit commented on pull request #2738: 
[GOBBLIN-885]Fix orc-Compaction bug in non-dedup mode and add unit-test
URL: https://github.com/apache/incubator-gobblin/pull/2738
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 316838)
Time Spent: 40m  (was: 0.5h)

> Fix ORC-Compaction bug in type-casting
> --
>
> Key: GOBBLIN-885
> URL: https://issues.apache.org/jira/browse/GOBBLIN-885
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Lei Sun
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)