[GitHub] [incubator-gobblin] arekusuri commented on a change in pull request #2722: GOBBLIN-865: Add feature that enables PK-chunking in partition

2019-09-17 Thread GitBox
arekusuri commented on a change in pull request #2722: GOBBLIN-865: Add feature 
that enables PK-chunking in partition
URL: https://github.com/apache/incubator-gobblin/pull/2722#discussion_r325346830
 
 

 ##
 File path: 
gobblin-salesforce/src/main/java/org/apache/gobblin/salesforce/SalesforceExtractor.java
 ##
 @@ -979,39 +1118,33 @@ private void 
fetchResultBatchWithRetry(RecordSetList rs)
   @Override
   public void closeConnection() throws Exception {
 if (this.bulkConnection != null
-&& 
!this.bulkConnection.getJobStatus(this.bulkJob.getId()).getState().toString().equals("Closed"))
 {
+&& 
!this.bulkConnection.getJobStatus(this.getBulkJobId()).getState().toString().equals("Closed"))
 {
   log.info("Closing salesforce bulk job connection");
-  this.bulkConnection.closeJob(this.bulkJob.getId());
+  this.bulkConnection.closeJob(this.getBulkJobId());
 }
   }
 
-  public static List constructGetCommand(String restQuery) {
+  private static List constructGetCommand(String restQuery) {
 return Arrays.asList(new RestApiCommand().build(Arrays.asList(restQuery), 
RestApiCommandType.GET));
   }
 
   /**
* Waits for the PK batches to complete. The wait will stop after all 
batches are complete or on the first failed batch
* @param batchInfoList list of batch info
-   * @param retryInterval the polling interval
+   * @param waitInterval the polling interval
* @return the last {@link BatchInfo} processed
* @throws InterruptedException
* @throws AsyncApiException
*/
-  private BatchInfo waitForPkBatches(BatchInfoList batchInfoList, int 
retryInterval)
+  private BatchInfo waitForPkBatches(BatchInfoList batchInfoList, int 
waitInterval)
   throws InterruptedException, AsyncApiException {
 BatchInfo batchInfo = null;
 BatchInfo[] batchInfos = batchInfoList.getBatchInfo();
 
-// Wait for all batches other than the first one. The first one is not 
processed in PK chunking mode
-for (int i = 1; i < batchInfos.length; i++) {
-  BatchInfo bi = batchInfos[i];
-
-  // get refreshed job status
-  bi = this.bulkConnection.getBatchInfo(this.bulkJob.getId(), bi.getId());
-
-  while ((bi.getState() != BatchStateEnum.Completed)
-  && (bi.getState() != BatchStateEnum.Failed)) {
-Thread.sleep(retryInterval * 1000);
+for (BatchInfo bi: batchInfos) {
 
 Review comment:
   I had a ticket for this - 
https://jira01.corp.linkedin.com:8443/browse/DSS-1
   pkchunking is using this function.
   Sometimes, the parent may not be the first element in the list. (see 
screenshot in ticket)
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-gobblin] arekusuri commented on a change in pull request #2722: GOBBLIN-865: Add feature that enables PK-chunking in partition

2019-09-17 Thread GitBox
arekusuri commented on a change in pull request #2722: GOBBLIN-865: Add feature 
that enables PK-chunking in partition
URL: https://github.com/apache/incubator-gobblin/pull/2722#discussion_r325344285
 
 

 ##
 File path: 
gobblin-salesforce/src/main/java/org/apache/gobblin/salesforce/SalesforceExtractor.java
 ##
 @@ -779,7 +880,7 @@ public boolean bulkApiLogin() throws Exception {
   BatchInfoList batchInfoList = 
this.bulkConnection.getBatchInfoList(this.bulkJob.getId());
 
   if (usingPkChunking && bulkBatchInfo.getState() == 
BatchStateEnum.NotProcessed) {
 
 Review comment:
   Thanks! removed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-gobblin] arekusuri commented on a change in pull request #2722: GOBBLIN-865: Add feature that enables PK-chunking in partition

2019-09-17 Thread GitBox
arekusuri commented on a change in pull request #2722: GOBBLIN-865: Add feature 
that enables PK-chunking in partition
URL: https://github.com/apache/incubator-gobblin/pull/2722#discussion_r325354169
 
 

 ##
 File path: 
gobblin-salesforce/src/main/java/org/apache/gobblin/salesforce/SalesforceExtractor.java
 ##
 @@ -979,39 +1118,33 @@ private void 
fetchResultBatchWithRetry(RecordSetList rs)
   @Override
   public void closeConnection() throws Exception {
 if (this.bulkConnection != null
-&& 
!this.bulkConnection.getJobStatus(this.bulkJob.getId()).getState().toString().equals("Closed"))
 {
+&& 
!this.bulkConnection.getJobStatus(this.getBulkJobId()).getState().toString().equals("Closed"))
 {
   log.info("Closing salesforce bulk job connection");
-  this.bulkConnection.closeJob(this.bulkJob.getId());
+  this.bulkConnection.closeJob(this.getBulkJobId());
 }
   }
 
-  public static List constructGetCommand(String restQuery) {
+  private static List constructGetCommand(String restQuery) {
 return Arrays.asList(new RestApiCommand().build(Arrays.asList(restQuery), 
RestApiCommandType.GET));
   }
 
   /**
* Waits for the PK batches to complete. The wait will stop after all 
batches are complete or on the first failed batch
* @param batchInfoList list of batch info
-   * @param retryInterval the polling interval
+   * @param waitInterval the polling interval
* @return the last {@link BatchInfo} processed
* @throws InterruptedException
* @throws AsyncApiException
*/
-  private BatchInfo waitForPkBatches(BatchInfoList batchInfoList, int 
retryInterval)
+  private BatchInfo waitForPkBatches(BatchInfoList batchInfoList, int 
waitInterval)
   throws InterruptedException, AsyncApiException {
 BatchInfo batchInfo = null;
 BatchInfo[] batchInfos = batchInfoList.getBatchInfo();
 
-// Wait for all batches other than the first one. The first one is not 
processed in PK chunking mode
-for (int i = 1; i < batchInfos.length; i++) {
-  BatchInfo bi = batchInfos[i];
-
-  // get refreshed job status
-  bi = this.bulkConnection.getBatchInfo(this.bulkJob.getId(), bi.getId());
-
-  while ((bi.getState() != BatchStateEnum.Completed)
-  && (bi.getState() != BatchStateEnum.Failed)) {
-Thread.sleep(retryInterval * 1000);
+for (BatchInfo bi: batchInfos) {
+  BatchStateEnum state = bi.getState();
+  while (state != BatchStateEnum.Completed && state != 
BatchStateEnum.Failed && state != BatchStateEnum.NotProcessed) {
+Thread.sleep(waitInterval * 1000);
 bi = this.bulkConnection.getBatchInfo(this.bulkJob.getId(), 
bi.getId());
 
 Review comment:
   Good catch! Thanks!
   I should not use the state variable. My test worked, because I was using 
break point, the time was enough to let sfdc execute.
   
   I did more refactoring for this part. Pushing code advance. Will do test 
during today. and update you. Thanks!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-gobblin] arekusuri commented on a change in pull request #2722: GOBBLIN-865: Add feature that enables PK-chunking in partition

2019-09-17 Thread GitBox
arekusuri commented on a change in pull request #2722: GOBBLIN-865: Add feature 
that enables PK-chunking in partition
URL: https://github.com/apache/incubator-gobblin/pull/2722#discussion_r325343123
 
 

 ##
 File path: 
gobblin-salesforce/src/main/java/org/apache/gobblin/salesforce/SalesforceExtractor.java
 ##
 @@ -779,7 +880,7 @@ public boolean bulkApiLogin() throws Exception {
   BatchInfoList batchInfoList = 
this.bulkConnection.getBatchInfoList(this.bulkJob.getId());
 
 Review comment:
   >>   BatchInfoList batchInfoList = 
this.bulkConnection.getBatchInfoList(this.bulkJob.getId());
   I try not to change the existing code in **getQueryResultIds**. And this 
line is necessary. **getBatchInfoList** function's return value is 
**BatchIdAndResultId** which means a list of resultIds with corresponded batch 
Ids.
   One bulkJobId could have multiple results even though it is not a 
pk-chunking.
   
   I am not use **getQueryResultIds** for pkchunking. If we want to do further 
code refactoring, let's do it in another ticket.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-gobblin] menarguez commented on issue #2734: GOBBLIN-880 Bump CouchbaseWriter Couchbase SDK version + write docs +…

2019-09-17 Thread GitBox
menarguez commented on issue #2734: GOBBLIN-880 Bump CouchbaseWriter Couchbase 
SDK version + write docs +…
URL: 
https://github.com/apache/incubator-gobblin/pull/2734#issuecomment-532290703
 
 
   @yukuai518  I bumped couchbase version as to introduce cert based 
authentication as shown in Couchbase JAVA SDK docs in 
https://docs.couchbase.com/java-sdk/2.7/sdk-authentication-overview.html#authenticating-a-java-client-by-certificate
 with the com.couchbase.client.java.auth.CertAuthenticator . Something I just 
realized is that couchbase also now supports RBAC auth that includes username 
in the connection string. See 
[example](https://docs.couchbase.com/java-sdk/2.7/start-using-sdk.html). This 
should also be implemented in a separate PR


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Work logged] (GOBBLIN-865) Add feature that enables PK-chunking in partition

2019-09-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-865?focusedWorklogId=313806=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-313806
 ]

ASF GitHub Bot logged work on GOBBLIN-865:
--

Author: ASF GitHub Bot
Created on: 17/Sep/19 17:08
Start Date: 17/Sep/19 17:08
Worklog Time Spent: 10m 
  Work Description: htran1 commented on pull request #2722: GOBBLIN-865: 
Add feature that enables PK-chunking in partition
URL: https://github.com/apache/incubator-gobblin/pull/2722#discussion_r325283088
 
 

 ##
 File path: 
gobblin-salesforce/src/main/java/org/apache/gobblin/salesforce/SalesforceExtractor.java
 ##
 @@ -979,39 +1118,33 @@ private void 
fetchResultBatchWithRetry(RecordSetList rs)
   @Override
   public void closeConnection() throws Exception {
 if (this.bulkConnection != null
-&& 
!this.bulkConnection.getJobStatus(this.bulkJob.getId()).getState().toString().equals("Closed"))
 {
+&& 
!this.bulkConnection.getJobStatus(this.getBulkJobId()).getState().toString().equals("Closed"))
 {
   log.info("Closing salesforce bulk job connection");
-  this.bulkConnection.closeJob(this.bulkJob.getId());
+  this.bulkConnection.closeJob(this.getBulkJobId());
 }
   }
 
-  public static List constructGetCommand(String restQuery) {
+  private static List constructGetCommand(String restQuery) {
 return Arrays.asList(new RestApiCommand().build(Arrays.asList(restQuery), 
RestApiCommandType.GET));
   }
 
   /**
* Waits for the PK batches to complete. The wait will stop after all 
batches are complete or on the first failed batch
* @param batchInfoList list of batch info
-   * @param retryInterval the polling interval
+   * @param waitInterval the polling interval
* @return the last {@link BatchInfo} processed
* @throws InterruptedException
* @throws AsyncApiException
*/
-  private BatchInfo waitForPkBatches(BatchInfoList batchInfoList, int 
retryInterval)
+  private BatchInfo waitForPkBatches(BatchInfoList batchInfoList, int 
waitInterval)
   throws InterruptedException, AsyncApiException {
 BatchInfo batchInfo = null;
 BatchInfo[] batchInfos = batchInfoList.getBatchInfo();
 
-// Wait for all batches other than the first one. The first one is not 
processed in PK chunking mode
-for (int i = 1; i < batchInfos.length; i++) {
-  BatchInfo bi = batchInfos[i];
-
-  // get refreshed job status
-  bi = this.bulkConnection.getBatchInfo(this.bulkJob.getId(), bi.getId());
-
-  while ((bi.getState() != BatchStateEnum.Completed)
-  && (bi.getState() != BatchStateEnum.Failed)) {
-Thread.sleep(retryInterval * 1000);
+for (BatchInfo bi: batchInfos) {
 
 Review comment:
   Is it no longer true that the first BatchInfo is for the parent and not the 
pk chunking batches or are you considering it safe to apply the same check 
since the state should be completed?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 313806)
Time Spent: 5h 20m  (was: 5h 10m)

> Add feature that enables PK-chunking in partition 
> --
>
> Key: GOBBLIN-865
> URL: https://issues.apache.org/jira/browse/GOBBLIN-865
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Alex Li
>Priority: Major
>  Labels: salesforce
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> In SFDC(salesforce) connector, we have partitioning mechanisms to split a 
> giant query to multiple sub queries. There are 3 mechanisms:
>  * simple partition (equally split by time)
>  * dynamic pre-partition (generate histogram and split by row numbers)
>  * user specified partition (set up time range in job file)
> However there are tables like Task and Contract are failing time to time to 
> fetch full data.
> We may want to utilize PK-chunking to partition the query.
>  
> The pk-chunking doc from SFDC - 
> [https://developer.salesforce.com/docs/atlas.en-us.api_asynch.meta/api_asynch/async_api_headers_enable_pk_chunking.htm]



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (GOBBLIN-865) Add feature that enables PK-chunking in partition

2019-09-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-865?focusedWorklogId=313807=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-313807
 ]

ASF GitHub Bot logged work on GOBBLIN-865:
--

Author: ASF GitHub Bot
Created on: 17/Sep/19 17:08
Start Date: 17/Sep/19 17:08
Worklog Time Spent: 10m 
  Work Description: htran1 commented on pull request #2722: GOBBLIN-865: 
Add feature that enables PK-chunking in partition
URL: https://github.com/apache/incubator-gobblin/pull/2722#discussion_r325267416
 
 

 ##
 File path: 
gobblin-salesforce/src/main/java/org/apache/gobblin/salesforce/SalesforceExtractor.java
 ##
 @@ -779,7 +880,7 @@ public boolean bulkApiLogin() throws Exception {
   BatchInfoList batchInfoList = 
this.bulkConnection.getBatchInfoList(this.bulkJob.getId());
 
   if (usingPkChunking && bulkBatchInfo.getState() == 
BatchStateEnum.NotProcessed) {
 
 Review comment:
   Remove this block too since usingPkChunking is no longer set.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 313807)
Time Spent: 5.5h  (was: 5h 20m)

> Add feature that enables PK-chunking in partition 
> --
>
> Key: GOBBLIN-865
> URL: https://issues.apache.org/jira/browse/GOBBLIN-865
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Alex Li
>Priority: Major
>  Labels: salesforce
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> In SFDC(salesforce) connector, we have partitioning mechanisms to split a 
> giant query to multiple sub queries. There are 3 mechanisms:
>  * simple partition (equally split by time)
>  * dynamic pre-partition (generate histogram and split by row numbers)
>  * user specified partition (set up time range in job file)
> However there are tables like Task and Contract are failing time to time to 
> fetch full data.
> We may want to utilize PK-chunking to partition the query.
>  
> The pk-chunking doc from SFDC - 
> [https://developer.salesforce.com/docs/atlas.en-us.api_asynch.meta/api_asynch/async_api_headers_enable_pk_chunking.htm]



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (GOBBLIN-865) Add feature that enables PK-chunking in partition

2019-09-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-865?focusedWorklogId=313803=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-313803
 ]

ASF GitHub Bot logged work on GOBBLIN-865:
--

Author: ASF GitHub Bot
Created on: 17/Sep/19 17:08
Start Date: 17/Sep/19 17:08
Worklog Time Spent: 10m 
  Work Description: htran1 commented on pull request #2722: GOBBLIN-865: 
Add feature that enables PK-chunking in partition
URL: https://github.com/apache/incubator-gobblin/pull/2722#discussion_r325284644
 
 

 ##
 File path: 
gobblin-salesforce/src/main/java/org/apache/gobblin/salesforce/SalesforceExtractor.java
 ##
 @@ -144,28 +142,18 @@
   private final boolean pkChunkingSkipCountCheck;
   private final boolean bulkApiUseQueryAll;
 
+  private WorkUnitState workUnitState;
+
   public SalesforceExtractor(WorkUnitState state) {
 super(state);
-this.sfConnector = (SalesforceConnector) this.connector;
-
-// don't allow pk chunking if max partitions too high or have user 
specified partitions
-if (state.getPropAsBoolean(Partitioner.HAS_USER_SPECIFIED_PARTITIONS, 
false)
 
 Review comment:
   How come this check was removed? If the user has specified partitions then 
that should override PK chunking.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 313803)
Time Spent: 5h 10m  (was: 5h)

> Add feature that enables PK-chunking in partition 
> --
>
> Key: GOBBLIN-865
> URL: https://issues.apache.org/jira/browse/GOBBLIN-865
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Alex Li
>Priority: Major
>  Labels: salesforce
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> In SFDC(salesforce) connector, we have partitioning mechanisms to split a 
> giant query to multiple sub queries. There are 3 mechanisms:
>  * simple partition (equally split by time)
>  * dynamic pre-partition (generate histogram and split by row numbers)
>  * user specified partition (set up time range in job file)
> However there are tables like Task and Contract are failing time to time to 
> fetch full data.
> We may want to utilize PK-chunking to partition the query.
>  
> The pk-chunking doc from SFDC - 
> [https://developer.salesforce.com/docs/atlas.en-us.api_asynch.meta/api_asynch/async_api_headers_enable_pk_chunking.htm]



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (GOBBLIN-865) Add feature that enables PK-chunking in partition

2019-09-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-865?focusedWorklogId=313805=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-313805
 ]

ASF GitHub Bot logged work on GOBBLIN-865:
--

Author: ASF GitHub Bot
Created on: 17/Sep/19 17:08
Start Date: 17/Sep/19 17:08
Worklog Time Spent: 10m 
  Work Description: htran1 commented on pull request #2722: GOBBLIN-865: 
Add feature that enables PK-chunking in partition
URL: https://github.com/apache/incubator-gobblin/pull/2722#discussion_r325264698
 
 

 ##
 File path: 
gobblin-salesforce/src/main/java/org/apache/gobblin/salesforce/SalesforceExtractor.java
 ##
 @@ -144,28 +142,18 @@
   private final boolean pkChunkingSkipCountCheck;
   private final boolean bulkApiUseQueryAll;
 
+  private WorkUnitState workUnitState;
+
   public SalesforceExtractor(WorkUnitState state) {
 super(state);
-this.sfConnector = (SalesforceConnector) this.connector;
-
-// don't allow pk chunking if max partitions too high or have user 
specified partitions
-if (state.getPropAsBoolean(Partitioner.HAS_USER_SPECIFIED_PARTITIONS, 
false)
-|| 
state.getPropAsInt(ConfigurationKeys.SOURCE_MAX_NUMBER_OF_PARTITIONS,
-ConfigurationKeys.DEFAULT_MAX_NUMBER_OF_PARTITIONS) > 
PK_CHUNKING_MAX_PARTITIONS_LIMIT) {
-  if (state.getPropAsBoolean(ENABLE_PK_CHUNKING_KEY, false)) {
-log.warn("Max partitions too high, so PK chunking is not enabled");
-  }
-
-  this.pkChunking = false;
-} else {
-  this.pkChunking = state.getPropAsBoolean(ENABLE_PK_CHUNKING_KEY, false);
-}
+this.workUnitState = state;
 
+this.sfConnector = (SalesforceConnector) this.connector;
 this.pkChunkingSize =
 Math.max(MIN_PK_CHUNKING_SIZE,
-Math.min(MAX_PK_CHUNKING_SIZE, 
state.getPropAsInt(PK_CHUNKING_SIZE_KEY, DEFAULT_PK_CHUNKING_SIZE)));
+Math.min(MAX_PK_CHUNKING_SIZE, 
state.getPropAsInt(PARTITION_PK_CHUNKING_SIZE, DEFAULT_PK_CHUNKING_SIZE)));
 
-this.pkChunkingSkipCountCheck = 
state.getPropAsBoolean(PK_CHUNKING_SKIP_COUNT_CHECK, 
DEFAULT_PK_CHUNKING_SKIP_COUNT_CHECK);
+this.pkChunkingSkipCountCheck = true;// won't be able to get count
 
 Review comment:
   Should remove this since the config is no longer used.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 313805)
Time Spent: 5h 10m  (was: 5h)

> Add feature that enables PK-chunking in partition 
> --
>
> Key: GOBBLIN-865
> URL: https://issues.apache.org/jira/browse/GOBBLIN-865
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Alex Li
>Priority: Major
>  Labels: salesforce
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> In SFDC(salesforce) connector, we have partitioning mechanisms to split a 
> giant query to multiple sub queries. There are 3 mechanisms:
>  * simple partition (equally split by time)
>  * dynamic pre-partition (generate histogram and split by row numbers)
>  * user specified partition (set up time range in job file)
> However there are tables like Task and Contract are failing time to time to 
> fetch full data.
> We may want to utilize PK-chunking to partition the query.
>  
> The pk-chunking doc from SFDC - 
> [https://developer.salesforce.com/docs/atlas.en-us.api_asynch.meta/api_asynch/async_api_headers_enable_pk_chunking.htm]



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (GOBBLIN-865) Add feature that enables PK-chunking in partition

2019-09-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-865?focusedWorklogId=313802=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-313802
 ]

ASF GitHub Bot logged work on GOBBLIN-865:
--

Author: ASF GitHub Bot
Created on: 17/Sep/19 17:08
Start Date: 17/Sep/19 17:08
Worklog Time Spent: 10m 
  Work Description: htran1 commented on pull request #2722: GOBBLIN-865: 
Add feature that enables PK-chunking in partition
URL: https://github.com/apache/incubator-gobblin/pull/2722#discussion_r325283786
 
 

 ##
 File path: 
gobblin-salesforce/src/main/java/org/apache/gobblin/salesforce/SalesforceExtractor.java
 ##
 @@ -979,39 +1118,33 @@ private void 
fetchResultBatchWithRetry(RecordSetList rs)
   @Override
   public void closeConnection() throws Exception {
 if (this.bulkConnection != null
-&& 
!this.bulkConnection.getJobStatus(this.bulkJob.getId()).getState().toString().equals("Closed"))
 {
+&& 
!this.bulkConnection.getJobStatus(this.getBulkJobId()).getState().toString().equals("Closed"))
 {
   log.info("Closing salesforce bulk job connection");
-  this.bulkConnection.closeJob(this.bulkJob.getId());
+  this.bulkConnection.closeJob(this.getBulkJobId());
 }
   }
 
-  public static List constructGetCommand(String restQuery) {
+  private static List constructGetCommand(String restQuery) {
 return Arrays.asList(new RestApiCommand().build(Arrays.asList(restQuery), 
RestApiCommandType.GET));
   }
 
   /**
* Waits for the PK batches to complete. The wait will stop after all 
batches are complete or on the first failed batch
* @param batchInfoList list of batch info
-   * @param retryInterval the polling interval
+   * @param waitInterval the polling interval
* @return the last {@link BatchInfo} processed
* @throws InterruptedException
* @throws AsyncApiException
*/
-  private BatchInfo waitForPkBatches(BatchInfoList batchInfoList, int 
retryInterval)
+  private BatchInfo waitForPkBatches(BatchInfoList batchInfoList, int 
waitInterval)
   throws InterruptedException, AsyncApiException {
 BatchInfo batchInfo = null;
 BatchInfo[] batchInfos = batchInfoList.getBatchInfo();
 
-// Wait for all batches other than the first one. The first one is not 
processed in PK chunking mode
-for (int i = 1; i < batchInfos.length; i++) {
-  BatchInfo bi = batchInfos[i];
-
-  // get refreshed job status
-  bi = this.bulkConnection.getBatchInfo(this.bulkJob.getId(), bi.getId());
-
-  while ((bi.getState() != BatchStateEnum.Completed)
-  && (bi.getState() != BatchStateEnum.Failed)) {
-Thread.sleep(retryInterval * 1000);
+for (BatchInfo bi: batchInfos) {
+  BatchStateEnum state = bi.getState();
+  while (state != BatchStateEnum.Completed && state != 
BatchStateEnum.Failed && state != BatchStateEnum.NotProcessed) {
+Thread.sleep(waitInterval * 1000);
 bi = this.bulkConnection.getBatchInfo(this.bulkJob.getId(), 
bi.getId());
 
 Review comment:
   You are assigning to the outside loop variable. Not sure if this is 
intentional. Also, I don't see state being updated, so won't this loop forever?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 313802)
Time Spent: 5h  (was: 4h 50m)

> Add feature that enables PK-chunking in partition 
> --
>
> Key: GOBBLIN-865
> URL: https://issues.apache.org/jira/browse/GOBBLIN-865
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Alex Li
>Priority: Major
>  Labels: salesforce
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> In SFDC(salesforce) connector, we have partitioning mechanisms to split a 
> giant query to multiple sub queries. There are 3 mechanisms:
>  * simple partition (equally split by time)
>  * dynamic pre-partition (generate histogram and split by row numbers)
>  * user specified partition (set up time range in job file)
> However there are tables like Task and Contract are failing time to time to 
> fetch full data.
> We may want to utilize PK-chunking to partition the query.
>  
> The pk-chunking doc from SFDC - 
> [https://developer.salesforce.com/docs/atlas.en-us.api_asynch.meta/api_asynch/async_api_headers_enable_pk_chunking.htm]



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (GOBBLIN-865) Add feature that enables PK-chunking in partition

2019-09-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-865?focusedWorklogId=313804=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-313804
 ]

ASF GitHub Bot logged work on GOBBLIN-865:
--

Author: ASF GitHub Bot
Created on: 17/Sep/19 17:08
Start Date: 17/Sep/19 17:08
Worklog Time Spent: 10m 
  Work Description: htran1 commented on pull request #2722: GOBBLIN-865: 
Add feature that enables PK-chunking in partition
URL: https://github.com/apache/incubator-gobblin/pull/2722#discussion_r325281910
 
 

 ##
 File path: 
gobblin-salesforce/src/main/java/org/apache/gobblin/salesforce/SalesforceExtractor.java
 ##
 @@ -779,7 +880,7 @@ public boolean bulkApiLogin() throws Exception {
   BatchInfoList batchInfoList = 
this.bulkConnection.getBatchInfoList(this.bulkJob.getId());
 
 Review comment:
   Remove this?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 313804)
Time Spent: 5h 10m  (was: 5h)

> Add feature that enables PK-chunking in partition 
> --
>
> Key: GOBBLIN-865
> URL: https://issues.apache.org/jira/browse/GOBBLIN-865
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Alex Li
>Priority: Major
>  Labels: salesforce
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> In SFDC(salesforce) connector, we have partitioning mechanisms to split a 
> giant query to multiple sub queries. There are 3 mechanisms:
>  * simple partition (equally split by time)
>  * dynamic pre-partition (generate histogram and split by row numbers)
>  * user specified partition (set up time range in job file)
> However there are tables like Task and Contract are failing time to time to 
> fetch full data.
> We may want to utilize PK-chunking to partition the query.
>  
> The pk-chunking doc from SFDC - 
> [https://developer.salesforce.com/docs/atlas.en-us.api_asynch.meta/api_asynch/async_api_headers_enable_pk_chunking.htm]



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[GitHub] [incubator-gobblin] htran1 commented on a change in pull request #2722: GOBBLIN-865: Add feature that enables PK-chunking in partition

2019-09-17 Thread GitBox
htran1 commented on a change in pull request #2722: GOBBLIN-865: Add feature 
that enables PK-chunking in partition
URL: https://github.com/apache/incubator-gobblin/pull/2722#discussion_r325283088
 
 

 ##
 File path: 
gobblin-salesforce/src/main/java/org/apache/gobblin/salesforce/SalesforceExtractor.java
 ##
 @@ -979,39 +1118,33 @@ private void 
fetchResultBatchWithRetry(RecordSetList rs)
   @Override
   public void closeConnection() throws Exception {
 if (this.bulkConnection != null
-&& 
!this.bulkConnection.getJobStatus(this.bulkJob.getId()).getState().toString().equals("Closed"))
 {
+&& 
!this.bulkConnection.getJobStatus(this.getBulkJobId()).getState().toString().equals("Closed"))
 {
   log.info("Closing salesforce bulk job connection");
-  this.bulkConnection.closeJob(this.bulkJob.getId());
+  this.bulkConnection.closeJob(this.getBulkJobId());
 }
   }
 
-  public static List constructGetCommand(String restQuery) {
+  private static List constructGetCommand(String restQuery) {
 return Arrays.asList(new RestApiCommand().build(Arrays.asList(restQuery), 
RestApiCommandType.GET));
   }
 
   /**
* Waits for the PK batches to complete. The wait will stop after all 
batches are complete or on the first failed batch
* @param batchInfoList list of batch info
-   * @param retryInterval the polling interval
+   * @param waitInterval the polling interval
* @return the last {@link BatchInfo} processed
* @throws InterruptedException
* @throws AsyncApiException
*/
-  private BatchInfo waitForPkBatches(BatchInfoList batchInfoList, int 
retryInterval)
+  private BatchInfo waitForPkBatches(BatchInfoList batchInfoList, int 
waitInterval)
   throws InterruptedException, AsyncApiException {
 BatchInfo batchInfo = null;
 BatchInfo[] batchInfos = batchInfoList.getBatchInfo();
 
-// Wait for all batches other than the first one. The first one is not 
processed in PK chunking mode
-for (int i = 1; i < batchInfos.length; i++) {
-  BatchInfo bi = batchInfos[i];
-
-  // get refreshed job status
-  bi = this.bulkConnection.getBatchInfo(this.bulkJob.getId(), bi.getId());
-
-  while ((bi.getState() != BatchStateEnum.Completed)
-  && (bi.getState() != BatchStateEnum.Failed)) {
-Thread.sleep(retryInterval * 1000);
+for (BatchInfo bi: batchInfos) {
 
 Review comment:
   Is it no longer true that the first BatchInfo is for the parent and not the 
pk chunking batches or are you considering it safe to apply the same check 
since the state should be completed?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-gobblin] htran1 commented on a change in pull request #2722: GOBBLIN-865: Add feature that enables PK-chunking in partition

2019-09-17 Thread GitBox
htran1 commented on a change in pull request #2722: GOBBLIN-865: Add feature 
that enables PK-chunking in partition
URL: https://github.com/apache/incubator-gobblin/pull/2722#discussion_r325281910
 
 

 ##
 File path: 
gobblin-salesforce/src/main/java/org/apache/gobblin/salesforce/SalesforceExtractor.java
 ##
 @@ -779,7 +880,7 @@ public boolean bulkApiLogin() throws Exception {
   BatchInfoList batchInfoList = 
this.bulkConnection.getBatchInfoList(this.bulkJob.getId());
 
 Review comment:
   Remove this?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-gobblin] htran1 commented on a change in pull request #2722: GOBBLIN-865: Add feature that enables PK-chunking in partition

2019-09-17 Thread GitBox
htran1 commented on a change in pull request #2722: GOBBLIN-865: Add feature 
that enables PK-chunking in partition
URL: https://github.com/apache/incubator-gobblin/pull/2722#discussion_r325264698
 
 

 ##
 File path: 
gobblin-salesforce/src/main/java/org/apache/gobblin/salesforce/SalesforceExtractor.java
 ##
 @@ -144,28 +142,18 @@
   private final boolean pkChunkingSkipCountCheck;
   private final boolean bulkApiUseQueryAll;
 
+  private WorkUnitState workUnitState;
+
   public SalesforceExtractor(WorkUnitState state) {
 super(state);
-this.sfConnector = (SalesforceConnector) this.connector;
-
-// don't allow pk chunking if max partitions too high or have user 
specified partitions
-if (state.getPropAsBoolean(Partitioner.HAS_USER_SPECIFIED_PARTITIONS, 
false)
-|| 
state.getPropAsInt(ConfigurationKeys.SOURCE_MAX_NUMBER_OF_PARTITIONS,
-ConfigurationKeys.DEFAULT_MAX_NUMBER_OF_PARTITIONS) > 
PK_CHUNKING_MAX_PARTITIONS_LIMIT) {
-  if (state.getPropAsBoolean(ENABLE_PK_CHUNKING_KEY, false)) {
-log.warn("Max partitions too high, so PK chunking is not enabled");
-  }
-
-  this.pkChunking = false;
-} else {
-  this.pkChunking = state.getPropAsBoolean(ENABLE_PK_CHUNKING_KEY, false);
-}
+this.workUnitState = state;
 
+this.sfConnector = (SalesforceConnector) this.connector;
 this.pkChunkingSize =
 Math.max(MIN_PK_CHUNKING_SIZE,
-Math.min(MAX_PK_CHUNKING_SIZE, 
state.getPropAsInt(PK_CHUNKING_SIZE_KEY, DEFAULT_PK_CHUNKING_SIZE)));
+Math.min(MAX_PK_CHUNKING_SIZE, 
state.getPropAsInt(PARTITION_PK_CHUNKING_SIZE, DEFAULT_PK_CHUNKING_SIZE)));
 
-this.pkChunkingSkipCountCheck = 
state.getPropAsBoolean(PK_CHUNKING_SKIP_COUNT_CHECK, 
DEFAULT_PK_CHUNKING_SKIP_COUNT_CHECK);
+this.pkChunkingSkipCountCheck = true;// won't be able to get count
 
 Review comment:
   Should remove this since the config is no longer used.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-gobblin] htran1 commented on a change in pull request #2722: GOBBLIN-865: Add feature that enables PK-chunking in partition

2019-09-17 Thread GitBox
htran1 commented on a change in pull request #2722: GOBBLIN-865: Add feature 
that enables PK-chunking in partition
URL: https://github.com/apache/incubator-gobblin/pull/2722#discussion_r325283786
 
 

 ##
 File path: 
gobblin-salesforce/src/main/java/org/apache/gobblin/salesforce/SalesforceExtractor.java
 ##
 @@ -979,39 +1118,33 @@ private void 
fetchResultBatchWithRetry(RecordSetList rs)
   @Override
   public void closeConnection() throws Exception {
 if (this.bulkConnection != null
-&& 
!this.bulkConnection.getJobStatus(this.bulkJob.getId()).getState().toString().equals("Closed"))
 {
+&& 
!this.bulkConnection.getJobStatus(this.getBulkJobId()).getState().toString().equals("Closed"))
 {
   log.info("Closing salesforce bulk job connection");
-  this.bulkConnection.closeJob(this.bulkJob.getId());
+  this.bulkConnection.closeJob(this.getBulkJobId());
 }
   }
 
-  public static List constructGetCommand(String restQuery) {
+  private static List constructGetCommand(String restQuery) {
 return Arrays.asList(new RestApiCommand().build(Arrays.asList(restQuery), 
RestApiCommandType.GET));
   }
 
   /**
* Waits for the PK batches to complete. The wait will stop after all 
batches are complete or on the first failed batch
* @param batchInfoList list of batch info
-   * @param retryInterval the polling interval
+   * @param waitInterval the polling interval
* @return the last {@link BatchInfo} processed
* @throws InterruptedException
* @throws AsyncApiException
*/
-  private BatchInfo waitForPkBatches(BatchInfoList batchInfoList, int 
retryInterval)
+  private BatchInfo waitForPkBatches(BatchInfoList batchInfoList, int 
waitInterval)
   throws InterruptedException, AsyncApiException {
 BatchInfo batchInfo = null;
 BatchInfo[] batchInfos = batchInfoList.getBatchInfo();
 
-// Wait for all batches other than the first one. The first one is not 
processed in PK chunking mode
-for (int i = 1; i < batchInfos.length; i++) {
-  BatchInfo bi = batchInfos[i];
-
-  // get refreshed job status
-  bi = this.bulkConnection.getBatchInfo(this.bulkJob.getId(), bi.getId());
-
-  while ((bi.getState() != BatchStateEnum.Completed)
-  && (bi.getState() != BatchStateEnum.Failed)) {
-Thread.sleep(retryInterval * 1000);
+for (BatchInfo bi: batchInfos) {
+  BatchStateEnum state = bi.getState();
+  while (state != BatchStateEnum.Completed && state != 
BatchStateEnum.Failed && state != BatchStateEnum.NotProcessed) {
+Thread.sleep(waitInterval * 1000);
 bi = this.bulkConnection.getBatchInfo(this.bulkJob.getId(), 
bi.getId());
 
 Review comment:
   You are assigning to the outside loop variable. Not sure if this is 
intentional. Also, I don't see state being updated, so won't this loop forever?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-gobblin] htran1 commented on a change in pull request #2722: GOBBLIN-865: Add feature that enables PK-chunking in partition

2019-09-17 Thread GitBox
htran1 commented on a change in pull request #2722: GOBBLIN-865: Add feature 
that enables PK-chunking in partition
URL: https://github.com/apache/incubator-gobblin/pull/2722#discussion_r325267416
 
 

 ##
 File path: 
gobblin-salesforce/src/main/java/org/apache/gobblin/salesforce/SalesforceExtractor.java
 ##
 @@ -779,7 +880,7 @@ public boolean bulkApiLogin() throws Exception {
   BatchInfoList batchInfoList = 
this.bulkConnection.getBatchInfoList(this.bulkJob.getId());
 
   if (usingPkChunking && bulkBatchInfo.getState() == 
BatchStateEnum.NotProcessed) {
 
 Review comment:
   Remove this block too since usingPkChunking is no longer set.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-gobblin] htran1 commented on a change in pull request #2722: GOBBLIN-865: Add feature that enables PK-chunking in partition

2019-09-17 Thread GitBox
htran1 commented on a change in pull request #2722: GOBBLIN-865: Add feature 
that enables PK-chunking in partition
URL: https://github.com/apache/incubator-gobblin/pull/2722#discussion_r325284644
 
 

 ##
 File path: 
gobblin-salesforce/src/main/java/org/apache/gobblin/salesforce/SalesforceExtractor.java
 ##
 @@ -144,28 +142,18 @@
   private final boolean pkChunkingSkipCountCheck;
   private final boolean bulkApiUseQueryAll;
 
+  private WorkUnitState workUnitState;
+
   public SalesforceExtractor(WorkUnitState state) {
 super(state);
-this.sfConnector = (SalesforceConnector) this.connector;
-
-// don't allow pk chunking if max partitions too high or have user 
specified partitions
-if (state.getPropAsBoolean(Partitioner.HAS_USER_SPECIFIED_PARTITIONS, 
false)
 
 Review comment:
   How come this check was removed? If the user has specified partitions then 
that should override PK chunking.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-gobblin] shirshanka commented on a change in pull request #2730: GOBBLIN-876: Expose metrics() API in GobblinKafkaConsumerClient to al…

2019-09-17 Thread GitBox
shirshanka commented on a change in pull request #2730: GOBBLIN-876: Expose 
metrics() API in GobblinKafkaConsumerClient to al…
URL: https://github.com/apache/incubator-gobblin/pull/2730#discussion_r325302830
 
 

 ##
 File path: 
gobblin-modules/gobblin-kafka-09/src/main/java/org/apache/gobblin/kafka/client/Kafka09ConsumerClient.java
 ##
 @@ -162,6 +170,35 @@ public KafkaConsumerRecord apply(ConsumerRecord 
input) {
 });
   }
 
+  @Override
+  public Map metrics() {
+Map kafkaMetrics = (Map) 
this.consumer.metrics();
+Map codaHaleMetricMap = new HashMap<>();
 
 Review comment:
   do we need a new instance every time? 
   This diff doesn't show when and how this method is being called, so its hard 
to say if this new instance and copy on each call to metrics is a good idea or 
not. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Work logged] (GOBBLIN-876) Expose metrics() API in GobblinKafkaConsumerClient to allow consume metrics to be reported

2019-09-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-876?focusedWorklogId=313828=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-313828
 ]

ASF GitHub Bot logged work on GOBBLIN-876:
--

Author: ASF GitHub Bot
Created on: 17/Sep/19 17:49
Start Date: 17/Sep/19 17:49
Worklog Time Spent: 10m 
  Work Description: shirshanka commented on pull request #2730: 
GOBBLIN-876: Expose metrics() API in GobblinKafkaConsumerClient to al…
URL: https://github.com/apache/incubator-gobblin/pull/2730#discussion_r325302830
 
 

 ##
 File path: 
gobblin-modules/gobblin-kafka-09/src/main/java/org/apache/gobblin/kafka/client/Kafka09ConsumerClient.java
 ##
 @@ -162,6 +170,35 @@ public KafkaConsumerRecord apply(ConsumerRecord 
input) {
 });
   }
 
+  @Override
+  public Map metrics() {
+Map kafkaMetrics = (Map) 
this.consumer.metrics();
+Map codaHaleMetricMap = new HashMap<>();
 
 Review comment:
   do we need a new instance every time? 
   This diff doesn't show when and how this method is being called, so its hard 
to say if this new instance and copy on each call to metrics is a good idea or 
not. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 313828)
Time Spent: 2h 50m  (was: 2h 40m)

> Expose metrics() API in GobblinKafkaConsumerClient to allow consume metrics 
> to be reported
> --
>
> Key: GOBBLIN-876
> URL: https://issues.apache.org/jira/browse/GOBBLIN-876
> Project: Apache Gobblin
>  Issue Type: Improvement
>  Components: gobblin-kafka
>Affects Versions: 0.15.0
>Reporter: Sudarshan Vasudevan
>Assignee: Shirshanka Das
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Newer Kafka consumer expose metrics() API that report a number of consumer 
> metrics such as lag, latency, etc. which are very useful for monitoring and 
> debugging. We expose a metrics() API in GobblinKafkaConsumerClient to allow 
> consumer metrics to be reported. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[GitHub] [incubator-gobblin] shirshanka commented on a change in pull request #2730: GOBBLIN-876: Expose metrics() API in GobblinKafkaConsumerClient to al…

2019-09-17 Thread GitBox
shirshanka commented on a change in pull request #2730: GOBBLIN-876: Expose 
metrics() API in GobblinKafkaConsumerClient to al…
URL: https://github.com/apache/incubator-gobblin/pull/2730#discussion_r325301544
 
 

 ##
 File path: 
gobblin-modules/gobblin-kafka-common/src/main/java/org/apache/gobblin/kafka/client/GobblinKafkaConsumerClient.java
 ##
 @@ -86,6 +89,15 @@
*/
   public Iterator consume(KafkaPartition partition, long 
nextOffset, long maxOffset);
 
+  /**
+   * API to return underlying Kafka consumer metrics. The individual 
implementations must translate
+   * org.apache.kafka.common.Metric to Coda Hale Metrics.
+   * @return
+   */
+  public default Map metrics() {
 
 Review comment:
   since metrics is not a verb... maybe this method should be called 
getMetrics()?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Work logged] (GOBBLIN-876) Expose metrics() API in GobblinKafkaConsumerClient to allow consume metrics to be reported

2019-09-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-876?focusedWorklogId=313820=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-313820
 ]

ASF GitHub Bot logged work on GOBBLIN-876:
--

Author: ASF GitHub Bot
Created on: 17/Sep/19 17:46
Start Date: 17/Sep/19 17:46
Worklog Time Spent: 10m 
  Work Description: shirshanka commented on pull request #2730: 
GOBBLIN-876: Expose metrics() API in GobblinKafkaConsumerClient to al…
URL: https://github.com/apache/incubator-gobblin/pull/2730#discussion_r325301544
 
 

 ##
 File path: 
gobblin-modules/gobblin-kafka-common/src/main/java/org/apache/gobblin/kafka/client/GobblinKafkaConsumerClient.java
 ##
 @@ -86,6 +89,15 @@
*/
   public Iterator consume(KafkaPartition partition, long 
nextOffset, long maxOffset);
 
+  /**
+   * API to return underlying Kafka consumer metrics. The individual 
implementations must translate
+   * org.apache.kafka.common.Metric to Coda Hale Metrics.
+   * @return
+   */
+  public default Map metrics() {
 
 Review comment:
   since metrics is not a verb... maybe this method should be called 
getMetrics()?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 313820)
Time Spent: 2h 40m  (was: 2.5h)

> Expose metrics() API in GobblinKafkaConsumerClient to allow consume metrics 
> to be reported
> --
>
> Key: GOBBLIN-876
> URL: https://issues.apache.org/jira/browse/GOBBLIN-876
> Project: Apache Gobblin
>  Issue Type: Improvement
>  Components: gobblin-kafka
>Affects Versions: 0.15.0
>Reporter: Sudarshan Vasudevan
>Assignee: Shirshanka Das
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Newer Kafka consumer expose metrics() API that report a number of consumer 
> metrics such as lag, latency, etc. which are very useful for monitoring and 
> debugging. We expose a metrics() API in GobblinKafkaConsumerClient to allow 
> consumer metrics to be reported. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (GOBBLIN-880) Bump CouchbaseWriter Couchbase SDK version + write docs + cert based auth + enable TTL + dnsSrv

2019-09-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-880?focusedWorklogId=313772=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-313772
 ]

ASF GitHub Bot logged work on GOBBLIN-880:
--

Author: ASF GitHub Bot
Created on: 17/Sep/19 16:11
Start Date: 17/Sep/19 16:11
Worklog Time Spent: 10m 
  Work Description: menarguez commented on issue #2734: GOBBLIN-880 Bump 
CouchbaseWriter Couchbase SDK version + write docs +…
URL: 
https://github.com/apache/incubator-gobblin/pull/2734#issuecomment-532290703
 
 
   @yukuai518  I bumped couchbase version as to introduce cert based 
authentication as shown in Couchbase JAVA SDK docs in 
https://docs.couchbase.com/java-sdk/2.7/sdk-authentication-overview.html#authenticating-a-java-client-by-certificate
 with the com.couchbase.client.java.auth.CertAuthenticator . Something I just 
realized is that couchbase also now supports RBAC auth that includes username 
in the connection string. See 
[example](https://docs.couchbase.com/java-sdk/2.7/start-using-sdk.html). This 
should also be implemented in a separate PR
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 313772)
Remaining Estimate: 167h 10m  (was: 167h 20m)
Time Spent: 50m  (was: 40m)

> Bump CouchbaseWriter Couchbase SDK version + write docs + cert based auth + 
> enable TTL + dnsSrv
> ---
>
> Key: GOBBLIN-880
> URL: https://issues.apache.org/jira/browse/GOBBLIN-880
> Project: Apache Gobblin
>  Issue Type: Improvement
>  Components: gobblin-couchbase
>Reporter: Michael A Menarguez
>Assignee: Shirshanka Das
>Priority: Major
>  Labels: Couchbase
> Fix For: 0.15.0
>
>   Original Estimate: 168h
>  Time Spent: 50m
>  Remaining Estimate: 167h 10m
>
> h1. h1. CURRENT ISSUES
> Currently CouchbaseWriter.java lacks the ability to do the following:
>  # Use certificate based authentication
>  # Set document expiry (TTL)
>  ** based on write time
>  ** based on an offset specified field contained in the record's data (JSON)
>  ** (WILL NOT ADRESS) set expiry based on a field contained in the record's 
> data
>  # Set DNS SRV for bootstrap host discovery setting
>  # Missing documentation on CouchbaseWriter usage
>  # Testing does not bring in CouchbaseMock correctly and causes problems 
> while bumping com.couchbase.client:java-client
> h1. h1. PROPOSED SOLUTIONS
>  # Add logic to connect using certificate based auth to the buckets (Will 
> need to bump  com.couchbase.client:java-client to a newer version like 2.7.6) 
> and associated configs
>  # TTL implementation
>  ## Add configs to allow setting a TTL (documentTTL) and also specify the 
> timeunits (documentTTLUnits) of these settings
>  ## Add logic to specify the path to key to the field containing the source 
> timestamp (documentTTLOriginField) and its units (documentTTLOriginUnits) to 
> disambiguate between UNIX (sec) timestamps and other formats like timestamps 
> in milliseconds.
>  ## N/A but logic would be similar to (2)
>  # Add missing dnsSrv config
>  # Write proper documentation
>  # Bring in CouchbaseMock from Gradle and adapt existing unit tests.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (GOBBLIN-865) Add feature that enables PK-chunking in partition

2019-09-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-865?focusedWorklogId=313879=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-313879
 ]

ASF GitHub Bot logged work on GOBBLIN-865:
--

Author: ASF GitHub Bot
Created on: 17/Sep/19 19:50
Start Date: 17/Sep/19 19:50
Worklog Time Spent: 10m 
  Work Description: arekusuri commented on pull request #2722: GOBBLIN-865: 
Add feature that enables PK-chunking in partition
URL: https://github.com/apache/incubator-gobblin/pull/2722#discussion_r325344285
 
 

 ##
 File path: 
gobblin-salesforce/src/main/java/org/apache/gobblin/salesforce/SalesforceExtractor.java
 ##
 @@ -779,7 +880,7 @@ public boolean bulkApiLogin() throws Exception {
   BatchInfoList batchInfoList = 
this.bulkConnection.getBatchInfoList(this.bulkJob.getId());
 
   if (usingPkChunking && bulkBatchInfo.getState() == 
BatchStateEnum.NotProcessed) {
 
 Review comment:
   Thanks! removed.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 313879)
Time Spent: 5h 50m  (was: 5h 40m)

> Add feature that enables PK-chunking in partition 
> --
>
> Key: GOBBLIN-865
> URL: https://issues.apache.org/jira/browse/GOBBLIN-865
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Alex Li
>Priority: Major
>  Labels: salesforce
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> In SFDC(salesforce) connector, we have partitioning mechanisms to split a 
> giant query to multiple sub queries. There are 3 mechanisms:
>  * simple partition (equally split by time)
>  * dynamic pre-partition (generate histogram and split by row numbers)
>  * user specified partition (set up time range in job file)
> However there are tables like Task and Contract are failing time to time to 
> fetch full data.
> We may want to utilize PK-chunking to partition the query.
>  
> The pk-chunking doc from SFDC - 
> [https://developer.salesforce.com/docs/atlas.en-us.api_asynch.meta/api_asynch/async_api_headers_enable_pk_chunking.htm]



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (GOBBLIN-865) Add feature that enables PK-chunking in partition

2019-09-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-865?focusedWorklogId=313877=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-313877
 ]

ASF GitHub Bot logged work on GOBBLIN-865:
--

Author: ASF GitHub Bot
Created on: 17/Sep/19 19:50
Start Date: 17/Sep/19 19:50
Worklog Time Spent: 10m 
  Work Description: arekusuri commented on pull request #2722: GOBBLIN-865: 
Add feature that enables PK-chunking in partition
URL: https://github.com/apache/incubator-gobblin/pull/2722#discussion_r325343123
 
 

 ##
 File path: 
gobblin-salesforce/src/main/java/org/apache/gobblin/salesforce/SalesforceExtractor.java
 ##
 @@ -779,7 +880,7 @@ public boolean bulkApiLogin() throws Exception {
   BatchInfoList batchInfoList = 
this.bulkConnection.getBatchInfoList(this.bulkJob.getId());
 
 Review comment:
   >>   BatchInfoList batchInfoList = 
this.bulkConnection.getBatchInfoList(this.bulkJob.getId());
   I try not to change the existing code in **getQueryResultIds**. And this 
line is necessary. **getBatchInfoList** function's return value is 
**BatchIdAndResultId** which means a list of resultIds with corresponded batch 
Ids.
   One bulkJobId could have multiple results even though it is not a 
pk-chunking.
   
   I am not use **getQueryResultIds** for pkchunking. If we want to do further 
code refactoring, let's do it in another ticket.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 313877)
Time Spent: 5h 40m  (was: 5.5h)

> Add feature that enables PK-chunking in partition 
> --
>
> Key: GOBBLIN-865
> URL: https://issues.apache.org/jira/browse/GOBBLIN-865
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Alex Li
>Priority: Major
>  Labels: salesforce
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> In SFDC(salesforce) connector, we have partitioning mechanisms to split a 
> giant query to multiple sub queries. There are 3 mechanisms:
>  * simple partition (equally split by time)
>  * dynamic pre-partition (generate histogram and split by row numbers)
>  * user specified partition (set up time range in job file)
> However there are tables like Task and Contract are failing time to time to 
> fetch full data.
> We may want to utilize PK-chunking to partition the query.
>  
> The pk-chunking doc from SFDC - 
> [https://developer.salesforce.com/docs/atlas.en-us.api_asynch.meta/api_asynch/async_api_headers_enable_pk_chunking.htm]



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (GOBBLIN-865) Add feature that enables PK-chunking in partition

2019-09-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-865?focusedWorklogId=313875=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-313875
 ]

ASF GitHub Bot logged work on GOBBLIN-865:
--

Author: ASF GitHub Bot
Created on: 17/Sep/19 19:50
Start Date: 17/Sep/19 19:50
Worklog Time Spent: 10m 
  Work Description: arekusuri commented on pull request #2722: GOBBLIN-865: 
Add feature that enables PK-chunking in partition
URL: https://github.com/apache/incubator-gobblin/pull/2722#discussion_r325354169
 
 

 ##
 File path: 
gobblin-salesforce/src/main/java/org/apache/gobblin/salesforce/SalesforceExtractor.java
 ##
 @@ -979,39 +1118,33 @@ private void 
fetchResultBatchWithRetry(RecordSetList rs)
   @Override
   public void closeConnection() throws Exception {
 if (this.bulkConnection != null
-&& 
!this.bulkConnection.getJobStatus(this.bulkJob.getId()).getState().toString().equals("Closed"))
 {
+&& 
!this.bulkConnection.getJobStatus(this.getBulkJobId()).getState().toString().equals("Closed"))
 {
   log.info("Closing salesforce bulk job connection");
-  this.bulkConnection.closeJob(this.bulkJob.getId());
+  this.bulkConnection.closeJob(this.getBulkJobId());
 }
   }
 
-  public static List constructGetCommand(String restQuery) {
+  private static List constructGetCommand(String restQuery) {
 return Arrays.asList(new RestApiCommand().build(Arrays.asList(restQuery), 
RestApiCommandType.GET));
   }
 
   /**
* Waits for the PK batches to complete. The wait will stop after all 
batches are complete or on the first failed batch
* @param batchInfoList list of batch info
-   * @param retryInterval the polling interval
+   * @param waitInterval the polling interval
* @return the last {@link BatchInfo} processed
* @throws InterruptedException
* @throws AsyncApiException
*/
-  private BatchInfo waitForPkBatches(BatchInfoList batchInfoList, int 
retryInterval)
+  private BatchInfo waitForPkBatches(BatchInfoList batchInfoList, int 
waitInterval)
   throws InterruptedException, AsyncApiException {
 BatchInfo batchInfo = null;
 BatchInfo[] batchInfos = batchInfoList.getBatchInfo();
 
-// Wait for all batches other than the first one. The first one is not 
processed in PK chunking mode
-for (int i = 1; i < batchInfos.length; i++) {
-  BatchInfo bi = batchInfos[i];
-
-  // get refreshed job status
-  bi = this.bulkConnection.getBatchInfo(this.bulkJob.getId(), bi.getId());
-
-  while ((bi.getState() != BatchStateEnum.Completed)
-  && (bi.getState() != BatchStateEnum.Failed)) {
-Thread.sleep(retryInterval * 1000);
+for (BatchInfo bi: batchInfos) {
+  BatchStateEnum state = bi.getState();
+  while (state != BatchStateEnum.Completed && state != 
BatchStateEnum.Failed && state != BatchStateEnum.NotProcessed) {
+Thread.sleep(waitInterval * 1000);
 bi = this.bulkConnection.getBatchInfo(this.bulkJob.getId(), 
bi.getId());
 
 Review comment:
   Good catch! Thanks!
   I should not use the state variable. My test worked, because I was using 
break point, the time was enough to let sfdc execute.
   
   I did more refactoring for this part. Pushing code advance. Will do test 
during today. and update you. Thanks!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 313875)
Time Spent: 5h 40m  (was: 5.5h)

> Add feature that enables PK-chunking in partition 
> --
>
> Key: GOBBLIN-865
> URL: https://issues.apache.org/jira/browse/GOBBLIN-865
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Alex Li
>Priority: Major
>  Labels: salesforce
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> In SFDC(salesforce) connector, we have partitioning mechanisms to split a 
> giant query to multiple sub queries. There are 3 mechanisms:
>  * simple partition (equally split by time)
>  * dynamic pre-partition (generate histogram and split by row numbers)
>  * user specified partition (set up time range in job file)
> However there are tables like Task and Contract are failing time to time to 
> fetch full data.
> We may want to utilize PK-chunking to partition the query.
>  
> The pk-chunking doc from SFDC - 
> [https://developer.salesforce.com/docs/atlas.en-us.api_asynch.meta/api_asynch/async_api_headers_enable_pk_chunking.htm]



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (GOBBLIN-865) Add feature that enables PK-chunking in partition

2019-09-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-865?focusedWorklogId=313876=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-313876
 ]

ASF GitHub Bot logged work on GOBBLIN-865:
--

Author: ASF GitHub Bot
Created on: 17/Sep/19 19:50
Start Date: 17/Sep/19 19:50
Worklog Time Spent: 10m 
  Work Description: arekusuri commented on pull request #2722: GOBBLIN-865: 
Add feature that enables PK-chunking in partition
URL: https://github.com/apache/incubator-gobblin/pull/2722#discussion_r325340740
 
 

 ##
 File path: 
gobblin-salesforce/src/main/java/org/apache/gobblin/salesforce/SalesforceExtractor.java
 ##
 @@ -144,28 +142,18 @@
   private final boolean pkChunkingSkipCountCheck;
   private final boolean bulkApiUseQueryAll;
 
+  private WorkUnitState workUnitState;
+
   public SalesforceExtractor(WorkUnitState state) {
 super(state);
-this.sfConnector = (SalesforceConnector) this.connector;
-
-// don't allow pk chunking if max partitions too high or have user 
specified partitions
-if (state.getPropAsBoolean(Partitioner.HAS_USER_SPECIFIED_PARTITIONS, 
false)
 
 Review comment:
   This part code was only for **PK chunking**. We don't need 2nd level 
PK-chunking any more. (if we have normal pre-partition, we should not use pk 
chunking. It burns out request quota)
   
   I create separate function - getQueryResultIdsPkChunking for pkchunking. and 
made it not to depends on class member this.pkChunking, therefore removed 
pkChunking.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 313876)
Time Spent: 5h 40m  (was: 5.5h)

> Add feature that enables PK-chunking in partition 
> --
>
> Key: GOBBLIN-865
> URL: https://issues.apache.org/jira/browse/GOBBLIN-865
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Alex Li
>Priority: Major
>  Labels: salesforce
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> In SFDC(salesforce) connector, we have partitioning mechanisms to split a 
> giant query to multiple sub queries. There are 3 mechanisms:
>  * simple partition (equally split by time)
>  * dynamic pre-partition (generate histogram and split by row numbers)
>  * user specified partition (set up time range in job file)
> However there are tables like Task and Contract are failing time to time to 
> fetch full data.
> We may want to utilize PK-chunking to partition the query.
>  
> The pk-chunking doc from SFDC - 
> [https://developer.salesforce.com/docs/atlas.en-us.api_asynch.meta/api_asynch/async_api_headers_enable_pk_chunking.htm]



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (GOBBLIN-865) Add feature that enables PK-chunking in partition

2019-09-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-865?focusedWorklogId=313878=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-313878
 ]

ASF GitHub Bot logged work on GOBBLIN-865:
--

Author: ASF GitHub Bot
Created on: 17/Sep/19 19:50
Start Date: 17/Sep/19 19:50
Worklog Time Spent: 10m 
  Work Description: arekusuri commented on pull request #2722: GOBBLIN-865: 
Add feature that enables PK-chunking in partition
URL: https://github.com/apache/incubator-gobblin/pull/2722#discussion_r325346830
 
 

 ##
 File path: 
gobblin-salesforce/src/main/java/org/apache/gobblin/salesforce/SalesforceExtractor.java
 ##
 @@ -979,39 +1118,33 @@ private void 
fetchResultBatchWithRetry(RecordSetList rs)
   @Override
   public void closeConnection() throws Exception {
 if (this.bulkConnection != null
-&& 
!this.bulkConnection.getJobStatus(this.bulkJob.getId()).getState().toString().equals("Closed"))
 {
+&& 
!this.bulkConnection.getJobStatus(this.getBulkJobId()).getState().toString().equals("Closed"))
 {
   log.info("Closing salesforce bulk job connection");
-  this.bulkConnection.closeJob(this.bulkJob.getId());
+  this.bulkConnection.closeJob(this.getBulkJobId());
 }
   }
 
-  public static List constructGetCommand(String restQuery) {
+  private static List constructGetCommand(String restQuery) {
 return Arrays.asList(new RestApiCommand().build(Arrays.asList(restQuery), 
RestApiCommandType.GET));
   }
 
   /**
* Waits for the PK batches to complete. The wait will stop after all 
batches are complete or on the first failed batch
* @param batchInfoList list of batch info
-   * @param retryInterval the polling interval
+   * @param waitInterval the polling interval
* @return the last {@link BatchInfo} processed
* @throws InterruptedException
* @throws AsyncApiException
*/
-  private BatchInfo waitForPkBatches(BatchInfoList batchInfoList, int 
retryInterval)
+  private BatchInfo waitForPkBatches(BatchInfoList batchInfoList, int 
waitInterval)
   throws InterruptedException, AsyncApiException {
 BatchInfo batchInfo = null;
 BatchInfo[] batchInfos = batchInfoList.getBatchInfo();
 
-// Wait for all batches other than the first one. The first one is not 
processed in PK chunking mode
-for (int i = 1; i < batchInfos.length; i++) {
-  BatchInfo bi = batchInfos[i];
-
-  // get refreshed job status
-  bi = this.bulkConnection.getBatchInfo(this.bulkJob.getId(), bi.getId());
-
-  while ((bi.getState() != BatchStateEnum.Completed)
-  && (bi.getState() != BatchStateEnum.Failed)) {
-Thread.sleep(retryInterval * 1000);
+for (BatchInfo bi: batchInfos) {
 
 Review comment:
   I had a ticket for this - 
https://jira01.corp.linkedin.com:8443/browse/DSS-1
   pkchunking is using this function.
   Sometimes, the parent may not be the first element in the list. (see 
screenshot in ticket)
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 313878)
Time Spent: 5h 40m  (was: 5.5h)

> Add feature that enables PK-chunking in partition 
> --
>
> Key: GOBBLIN-865
> URL: https://issues.apache.org/jira/browse/GOBBLIN-865
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Alex Li
>Priority: Major
>  Labels: salesforce
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> In SFDC(salesforce) connector, we have partitioning mechanisms to split a 
> giant query to multiple sub queries. There are 3 mechanisms:
>  * simple partition (equally split by time)
>  * dynamic pre-partition (generate histogram and split by row numbers)
>  * user specified partition (set up time range in job file)
> However there are tables like Task and Contract are failing time to time to 
> fetch full data.
> We may want to utilize PK-chunking to partition the query.
>  
> The pk-chunking doc from SFDC - 
> [https://developer.salesforce.com/docs/atlas.en-us.api_asynch.meta/api_asynch/async_api_headers_enable_pk_chunking.htm]



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[GitHub] [incubator-gobblin] arekusuri commented on a change in pull request #2722: GOBBLIN-865: Add feature that enables PK-chunking in partition

2019-09-17 Thread GitBox
arekusuri commented on a change in pull request #2722: GOBBLIN-865: Add feature 
that enables PK-chunking in partition
URL: https://github.com/apache/incubator-gobblin/pull/2722#discussion_r325340740
 
 

 ##
 File path: 
gobblin-salesforce/src/main/java/org/apache/gobblin/salesforce/SalesforceExtractor.java
 ##
 @@ -144,28 +142,18 @@
   private final boolean pkChunkingSkipCountCheck;
   private final boolean bulkApiUseQueryAll;
 
+  private WorkUnitState workUnitState;
+
   public SalesforceExtractor(WorkUnitState state) {
 super(state);
-this.sfConnector = (SalesforceConnector) this.connector;
-
-// don't allow pk chunking if max partitions too high or have user 
specified partitions
-if (state.getPropAsBoolean(Partitioner.HAS_USER_SPECIFIED_PARTITIONS, 
false)
 
 Review comment:
   This part code was only for **PK chunking**. We don't need 2nd level 
PK-chunking any more. (if we have normal pre-partition, we should not use pk 
chunking. It burns out request quota)
   
   I create separate function - getQueryResultIdsPkChunking for pkchunking. and 
made it not to depends on class member this.pkChunking, therefore removed 
pkChunking.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Work logged] (GOBBLIN-865) Add feature that enables PK-chunking in partition

2019-09-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-865?focusedWorklogId=313924=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-313924
 ]

ASF GitHub Bot logged work on GOBBLIN-865:
--

Author: ASF GitHub Bot
Created on: 17/Sep/19 21:15
Start Date: 17/Sep/19 21:15
Worklog Time Spent: 10m 
  Work Description: codecov-io commented on issue #2722: GOBBLIN-865: Add 
feature that enables PK-chunking in partition
URL: 
https://github.com/apache/incubator-gobblin/pull/2722#issuecomment-531069100
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2722?src=pr=h1)
 Report
   > Merging 
[#2722](https://codecov.io/gh/apache/incubator-gobblin/pull/2722?src=pr=desc)
 into 
[master](https://codecov.io/gh/apache/incubator-gobblin/commit/9bf9a882427e98e7f4ef089c4ca1bde42f4b36a3?src=pr=desc)
 will **decrease** coverage by `0.02%`.
   > The diff coverage is `1.66%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/graphs/tree.svg?width=650=4MgURJ0bGc=150=pr)](https://codecov.io/gh/apache/incubator-gobblin/pull/2722?src=pr=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#2722  +/-   ##
   
   - Coverage 45.04%   45.01%   -0.03% 
   - Complexity 8739 8770  +31 
   
 Files  1880 1886   +6 
 Lines 7020570591 +386 
 Branches   7707 7739  +32 
   
   + Hits  3162331779 +156 
   - Misses3565135864 +213 
   - Partials   2931 2948  +17
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-gobblin/pull/2722?src=pr=tree) 
| Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...obblin/salesforce/SalesforceConfigurationKeys.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/diff?src=pr=tree#diff-Z29iYmxpbi1zYWxlc2ZvcmNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NhbGVzZm9yY2UvU2FsZXNmb3JjZUNvbmZpZ3VyYXRpb25LZXlzLmphdmE=)
 | `0% <ø> (ø)` | `0 <0> (ø)` | :arrow_down: |
   | 
[...apache/gobblin/salesforce/SalesforceExtractor.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/diff?src=pr=tree#diff-Z29iYmxpbi1zYWxlc2ZvcmNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NhbGVzZm9yY2UvU2FsZXNmb3JjZUV4dHJhY3Rvci5qYXZh)
 | `0% <0%> (ø)` | `0 <0> (ø)` | :arrow_down: |
   | 
[...rg/apache/gobblin/salesforce/SalesforceSource.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/diff?src=pr=tree#diff-Z29iYmxpbi1zYWxlc2ZvcmNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NhbGVzZm9yY2UvU2FsZXNmb3JjZVNvdXJjZS5qYXZh)
 | `19.74% <5.66%> (-3.02%)` | `12 <1> (+1)` | |
   | 
[...obblin/service/monitoring/FlowStatusGenerator.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/diff?src=pr=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NlcnZpY2UvbW9uaXRvcmluZy9GbG93U3RhdHVzR2VuZXJhdG9yLmphdmE=)
 | `82.14% <0%> (-7.15%)` | `11% <0%> (-1%)` | |
   | 
[...apache/gobblin/runtime/local/LocalJobLauncher.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/diff?src=pr=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvbG9jYWwvTG9jYWxKb2JMYXVuY2hlci5qYXZh)
 | `61.81% <0%> (-2.34%)` | `5% <0%> (ø)` | |
   | 
[...ache/gobblin/couchbase/writer/CouchbaseWriter.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/diff?src=pr=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4tY291Y2hiYXNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NvdWNoYmFzZS93cml0ZXIvQ291Y2hiYXNlV3JpdGVyLmphdmE=)
 | `64.39% <0%> (-1.89%)` | `15% <0%> (+4%)` | |
   | 
[.../apache/gobblin/cluster/GobblinClusterManager.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/diff?src=pr=tree#diff-Z29iYmxpbi1jbHVzdGVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NsdXN0ZXIvR29iYmxpbkNsdXN0ZXJNYW5hZ2VyLmphdmE=)
 | `53.91% <0%> (-0.51%)` | `27% <0%> (ø)` | |
   | 
[...che/gobblin/hive/metastore/HiveMetaStoreUtils.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/diff?src=pr=tree#diff-Z29iYmxpbi1oaXZlLXJlZ2lzdHJhdGlvbi9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvZ29iYmxpbi9oaXZlL21ldGFzdG9yZS9IaXZlTWV0YVN0b3JlVXRpbHMuamF2YQ==)
 | `31.69% <0%> (-0.15%)` | `12% <0%> (ø)` | |
   | 
[...e/modules/flowgraph/datanodes/fs/AdlsDataNode.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/diff?src=pr=tree#diff-Z29iYmxpbi1zZXJ2aWNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NlcnZpY2UvbW9kdWxlcy9mbG93Z3JhcGgvZGF0YW5vZGVzL2ZzL0FkbHNEYXRhTm9kZS5qYXZh)
 | `50% <0%> (ø)` | `2% <0%> (ø)` | :arrow_down: |
   | 

[GitHub] [incubator-gobblin] codecov-io edited a comment on issue #2722: GOBBLIN-865: Add feature that enables PK-chunking in partition

2019-09-17 Thread GitBox
codecov-io edited a comment on issue #2722: GOBBLIN-865: Add feature that 
enables PK-chunking in partition
URL: 
https://github.com/apache/incubator-gobblin/pull/2722#issuecomment-531069100
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2722?src=pr=h1)
 Report
   > Merging 
[#2722](https://codecov.io/gh/apache/incubator-gobblin/pull/2722?src=pr=desc)
 into 
[master](https://codecov.io/gh/apache/incubator-gobblin/commit/9bf9a882427e98e7f4ef089c4ca1bde42f4b36a3?src=pr=desc)
 will **decrease** coverage by `0.02%`.
   > The diff coverage is `1.66%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/graphs/tree.svg?width=650=4MgURJ0bGc=150=pr)](https://codecov.io/gh/apache/incubator-gobblin/pull/2722?src=pr=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#2722  +/-   ##
   
   - Coverage 45.04%   45.01%   -0.03% 
   - Complexity 8739 8770  +31 
   
 Files  1880 1886   +6 
 Lines 7020570591 +386 
 Branches   7707 7739  +32 
   
   + Hits  3162331779 +156 
   - Misses3565135864 +213 
   - Partials   2931 2948  +17
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-gobblin/pull/2722?src=pr=tree) 
| Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...obblin/salesforce/SalesforceConfigurationKeys.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/diff?src=pr=tree#diff-Z29iYmxpbi1zYWxlc2ZvcmNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NhbGVzZm9yY2UvU2FsZXNmb3JjZUNvbmZpZ3VyYXRpb25LZXlzLmphdmE=)
 | `0% <ø> (ø)` | `0 <0> (ø)` | :arrow_down: |
   | 
[...apache/gobblin/salesforce/SalesforceExtractor.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/diff?src=pr=tree#diff-Z29iYmxpbi1zYWxlc2ZvcmNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NhbGVzZm9yY2UvU2FsZXNmb3JjZUV4dHJhY3Rvci5qYXZh)
 | `0% <0%> (ø)` | `0 <0> (ø)` | :arrow_down: |
   | 
[...rg/apache/gobblin/salesforce/SalesforceSource.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/diff?src=pr=tree#diff-Z29iYmxpbi1zYWxlc2ZvcmNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NhbGVzZm9yY2UvU2FsZXNmb3JjZVNvdXJjZS5qYXZh)
 | `19.74% <5.66%> (-3.02%)` | `12 <1> (+1)` | |
   | 
[...obblin/service/monitoring/FlowStatusGenerator.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/diff?src=pr=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NlcnZpY2UvbW9uaXRvcmluZy9GbG93U3RhdHVzR2VuZXJhdG9yLmphdmE=)
 | `82.14% <0%> (-7.15%)` | `11% <0%> (-1%)` | |
   | 
[...apache/gobblin/runtime/local/LocalJobLauncher.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/diff?src=pr=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvbG9jYWwvTG9jYWxKb2JMYXVuY2hlci5qYXZh)
 | `61.81% <0%> (-2.34%)` | `5% <0%> (ø)` | |
   | 
[...ache/gobblin/couchbase/writer/CouchbaseWriter.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/diff?src=pr=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4tY291Y2hiYXNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NvdWNoYmFzZS93cml0ZXIvQ291Y2hiYXNlV3JpdGVyLmphdmE=)
 | `64.39% <0%> (-1.89%)` | `15% <0%> (+4%)` | |
   | 
[.../apache/gobblin/cluster/GobblinClusterManager.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/diff?src=pr=tree#diff-Z29iYmxpbi1jbHVzdGVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NsdXN0ZXIvR29iYmxpbkNsdXN0ZXJNYW5hZ2VyLmphdmE=)
 | `53.91% <0%> (-0.51%)` | `27% <0%> (ø)` | |
   | 
[...che/gobblin/hive/metastore/HiveMetaStoreUtils.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/diff?src=pr=tree#diff-Z29iYmxpbi1oaXZlLXJlZ2lzdHJhdGlvbi9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvZ29iYmxpbi9oaXZlL21ldGFzdG9yZS9IaXZlTWV0YVN0b3JlVXRpbHMuamF2YQ==)
 | `31.69% <0%> (-0.15%)` | `12% <0%> (ø)` | |
   | 
[...e/modules/flowgraph/datanodes/fs/AdlsDataNode.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/diff?src=pr=tree#diff-Z29iYmxpbi1zZXJ2aWNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NlcnZpY2UvbW9kdWxlcy9mbG93Z3JhcGgvZGF0YW5vZGVzL2ZzL0FkbHNEYXRhTm9kZS5qYXZh)
 | `50% <0%> (ø)` | `2% <0%> (ø)` | :arrow_down: |
   | 
[...ka/workunit/packer/KafkaBiLevelWorkUnitPacker.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/diff?src=pr=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4ta2Fma2EtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NvdXJjZS9leHRyYWN0b3IvZXh0cmFjdC9rYWZrYS93b3JrdW5pdC9wYWNrZXIvS2Fma2FCaUxldmVsV29ya1VuaXRQYWNrZXIuamF2YQ==)
 | `0% <0%> (ø)` | `0% <0%> (ø)` | :arrow_down: |
   | ... and [33 
more](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/diff?src=pr=tree-more)
 | |
   
   --
   
   [Continue to review full report at 

[jira] [Work logged] (GOBBLIN-865) Add feature that enables PK-chunking in partition

2019-09-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-865?focusedWorklogId=313923=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-313923
 ]

ASF GitHub Bot logged work on GOBBLIN-865:
--

Author: ASF GitHub Bot
Created on: 17/Sep/19 21:11
Start Date: 17/Sep/19 21:11
Worklog Time Spent: 10m 
  Work Description: codecov-io commented on issue #2722: GOBBLIN-865: Add 
feature that enables PK-chunking in partition
URL: 
https://github.com/apache/incubator-gobblin/pull/2722#issuecomment-531069100
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2722?src=pr=h1)
 Report
   > Merging 
[#2722](https://codecov.io/gh/apache/incubator-gobblin/pull/2722?src=pr=desc)
 into 
[master](https://codecov.io/gh/apache/incubator-gobblin/commit/9bf9a882427e98e7f4ef089c4ca1bde42f4b36a3?src=pr=desc)
 will **decrease** coverage by `0.02%`.
   > The diff coverage is `1.66%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/graphs/tree.svg?width=650=4MgURJ0bGc=150=pr)](https://codecov.io/gh/apache/incubator-gobblin/pull/2722?src=pr=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#2722  +/-   ##
   
   - Coverage 45.04%   45.01%   -0.03% 
   - Complexity 8739 8770  +31 
   
 Files  1880 1886   +6 
 Lines 7020570591 +386 
 Branches   7707 7739  +32 
   
   + Hits  3162331779 +156 
   - Misses3565135864 +213 
   - Partials   2931 2948  +17
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-gobblin/pull/2722?src=pr=tree) 
| Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...obblin/salesforce/SalesforceConfigurationKeys.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/diff?src=pr=tree#diff-Z29iYmxpbi1zYWxlc2ZvcmNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NhbGVzZm9yY2UvU2FsZXNmb3JjZUNvbmZpZ3VyYXRpb25LZXlzLmphdmE=)
 | `0% <ø> (ø)` | `0 <0> (ø)` | :arrow_down: |
   | 
[...apache/gobblin/salesforce/SalesforceExtractor.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/diff?src=pr=tree#diff-Z29iYmxpbi1zYWxlc2ZvcmNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NhbGVzZm9yY2UvU2FsZXNmb3JjZUV4dHJhY3Rvci5qYXZh)
 | `0% <0%> (ø)` | `0 <0> (ø)` | :arrow_down: |
   | 
[...rg/apache/gobblin/salesforce/SalesforceSource.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/diff?src=pr=tree#diff-Z29iYmxpbi1zYWxlc2ZvcmNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NhbGVzZm9yY2UvU2FsZXNmb3JjZVNvdXJjZS5qYXZh)
 | `19.74% <5.66%> (-3.02%)` | `12 <1> (+1)` | |
   | 
[...obblin/service/monitoring/FlowStatusGenerator.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/diff?src=pr=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NlcnZpY2UvbW9uaXRvcmluZy9GbG93U3RhdHVzR2VuZXJhdG9yLmphdmE=)
 | `82.14% <0%> (-7.15%)` | `11% <0%> (-1%)` | |
   | 
[...apache/gobblin/runtime/local/LocalJobLauncher.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/diff?src=pr=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvbG9jYWwvTG9jYWxKb2JMYXVuY2hlci5qYXZh)
 | `61.81% <0%> (-2.34%)` | `5% <0%> (ø)` | |
   | 
[...ache/gobblin/couchbase/writer/CouchbaseWriter.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/diff?src=pr=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4tY291Y2hiYXNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NvdWNoYmFzZS93cml0ZXIvQ291Y2hiYXNlV3JpdGVyLmphdmE=)
 | `64.39% <0%> (-1.89%)` | `15% <0%> (+4%)` | |
   | 
[.../apache/gobblin/cluster/GobblinClusterManager.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/diff?src=pr=tree#diff-Z29iYmxpbi1jbHVzdGVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NsdXN0ZXIvR29iYmxpbkNsdXN0ZXJNYW5hZ2VyLmphdmE=)
 | `53.91% <0%> (-0.51%)` | `27% <0%> (ø)` | |
   | 
[...che/gobblin/hive/metastore/HiveMetaStoreUtils.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/diff?src=pr=tree#diff-Z29iYmxpbi1oaXZlLXJlZ2lzdHJhdGlvbi9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvZ29iYmxpbi9oaXZlL21ldGFzdG9yZS9IaXZlTWV0YVN0b3JlVXRpbHMuamF2YQ==)
 | `31.69% <0%> (-0.15%)` | `12% <0%> (ø)` | |
   | 
[...e/modules/flowgraph/datanodes/fs/AdlsDataNode.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/diff?src=pr=tree#diff-Z29iYmxpbi1zZXJ2aWNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NlcnZpY2UvbW9kdWxlcy9mbG93Z3JhcGgvZGF0YW5vZGVzL2ZzL0FkbHNEYXRhTm9kZS5qYXZh)
 | `50% <0%> (ø)` | `2% <0%> (ø)` | :arrow_down: |
   | 

[GitHub] [incubator-gobblin] codecov-io edited a comment on issue #2722: GOBBLIN-865: Add feature that enables PK-chunking in partition

2019-09-17 Thread GitBox
codecov-io edited a comment on issue #2722: GOBBLIN-865: Add feature that 
enables PK-chunking in partition
URL: 
https://github.com/apache/incubator-gobblin/pull/2722#issuecomment-531069100
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2722?src=pr=h1)
 Report
   > Merging 
[#2722](https://codecov.io/gh/apache/incubator-gobblin/pull/2722?src=pr=desc)
 into 
[master](https://codecov.io/gh/apache/incubator-gobblin/commit/9bf9a882427e98e7f4ef089c4ca1bde42f4b36a3?src=pr=desc)
 will **decrease** coverage by `0.02%`.
   > The diff coverage is `1.66%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/graphs/tree.svg?width=650=4MgURJ0bGc=150=pr)](https://codecov.io/gh/apache/incubator-gobblin/pull/2722?src=pr=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#2722  +/-   ##
   
   - Coverage 45.04%   45.01%   -0.03% 
   - Complexity 8739 8770  +31 
   
 Files  1880 1886   +6 
 Lines 7020570591 +386 
 Branches   7707 7739  +32 
   
   + Hits  3162331779 +156 
   - Misses3565135864 +213 
   - Partials   2931 2948  +17
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-gobblin/pull/2722?src=pr=tree) 
| Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...obblin/salesforce/SalesforceConfigurationKeys.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/diff?src=pr=tree#diff-Z29iYmxpbi1zYWxlc2ZvcmNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NhbGVzZm9yY2UvU2FsZXNmb3JjZUNvbmZpZ3VyYXRpb25LZXlzLmphdmE=)
 | `0% <ø> (ø)` | `0 <0> (ø)` | :arrow_down: |
   | 
[...apache/gobblin/salesforce/SalesforceExtractor.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/diff?src=pr=tree#diff-Z29iYmxpbi1zYWxlc2ZvcmNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NhbGVzZm9yY2UvU2FsZXNmb3JjZUV4dHJhY3Rvci5qYXZh)
 | `0% <0%> (ø)` | `0 <0> (ø)` | :arrow_down: |
   | 
[...rg/apache/gobblin/salesforce/SalesforceSource.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/diff?src=pr=tree#diff-Z29iYmxpbi1zYWxlc2ZvcmNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NhbGVzZm9yY2UvU2FsZXNmb3JjZVNvdXJjZS5qYXZh)
 | `19.74% <5.66%> (-3.02%)` | `12 <1> (+1)` | |
   | 
[...obblin/service/monitoring/FlowStatusGenerator.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/diff?src=pr=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NlcnZpY2UvbW9uaXRvcmluZy9GbG93U3RhdHVzR2VuZXJhdG9yLmphdmE=)
 | `82.14% <0%> (-7.15%)` | `11% <0%> (-1%)` | |
   | 
[...apache/gobblin/runtime/local/LocalJobLauncher.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/diff?src=pr=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvbG9jYWwvTG9jYWxKb2JMYXVuY2hlci5qYXZh)
 | `61.81% <0%> (-2.34%)` | `5% <0%> (ø)` | |
   | 
[...ache/gobblin/couchbase/writer/CouchbaseWriter.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/diff?src=pr=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4tY291Y2hiYXNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NvdWNoYmFzZS93cml0ZXIvQ291Y2hiYXNlV3JpdGVyLmphdmE=)
 | `64.39% <0%> (-1.89%)` | `15% <0%> (+4%)` | |
   | 
[.../apache/gobblin/cluster/GobblinClusterManager.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/diff?src=pr=tree#diff-Z29iYmxpbi1jbHVzdGVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NsdXN0ZXIvR29iYmxpbkNsdXN0ZXJNYW5hZ2VyLmphdmE=)
 | `53.91% <0%> (-0.51%)` | `27% <0%> (ø)` | |
   | 
[...che/gobblin/hive/metastore/HiveMetaStoreUtils.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/diff?src=pr=tree#diff-Z29iYmxpbi1oaXZlLXJlZ2lzdHJhdGlvbi9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvZ29iYmxpbi9oaXZlL21ldGFzdG9yZS9IaXZlTWV0YVN0b3JlVXRpbHMuamF2YQ==)
 | `31.69% <0%> (-0.15%)` | `12% <0%> (ø)` | |
   | 
[...e/modules/flowgraph/datanodes/fs/AdlsDataNode.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/diff?src=pr=tree#diff-Z29iYmxpbi1zZXJ2aWNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NlcnZpY2UvbW9kdWxlcy9mbG93Z3JhcGgvZGF0YW5vZGVzL2ZzL0FkbHNEYXRhTm9kZS5qYXZh)
 | `50% <0%> (ø)` | `2% <0%> (ø)` | :arrow_down: |
   | 
[...ka/workunit/packer/KafkaBiLevelWorkUnitPacker.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/diff?src=pr=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4ta2Fma2EtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NvdXJjZS9leHRyYWN0b3IvZXh0cmFjdC9rYWZrYS93b3JrdW5pdC9wYWNrZXIvS2Fma2FCaUxldmVsV29ya1VuaXRQYWNrZXIuamF2YQ==)
 | `0% <0%> (ø)` | `0% <0%> (ø)` | :arrow_down: |
   | ... and [33 
more](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/diff?src=pr=tree-more)
 | |
   
   --
   
   [Continue to review full report at 

[jira] [Created] (GOBBLIN-882) Fix Gobblin as a Service start scripts

2019-09-17 Thread William Lo (Jira)
William Lo created GOBBLIN-882:
--

 Summary: Fix Gobblin as a Service start scripts
 Key: GOBBLIN-882
 URL: https://issues.apache.org/jira/browse/GOBBLIN-882
 Project: Apache Gobblin
  Issue Type: Improvement
  Components: gobblin-service
Affects Versions: 0.15.0
Reporter: William Lo
Assignee: Abhishek Tiwari
 Fix For: 0.15.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)