[GitHub] [incubator-gobblin] arekusuri commented on a change in pull request #2722: GOBBLIN-865: Add feature that enables PK-chunking in partition
arekusuri commented on a change in pull request #2722: GOBBLIN-865: Add feature that enables PK-chunking in partition URL: https://github.com/apache/incubator-gobblin/pull/2722#discussion_r325346830 ## File path: gobblin-salesforce/src/main/java/org/apache/gobblin/salesforce/SalesforceExtractor.java ## @@ -979,39 +1118,33 @@ private void fetchResultBatchWithRetry(RecordSetList rs) @Override public void closeConnection() throws Exception { if (this.bulkConnection != null -&& !this.bulkConnection.getJobStatus(this.bulkJob.getId()).getState().toString().equals("Closed")) { +&& !this.bulkConnection.getJobStatus(this.getBulkJobId()).getState().toString().equals("Closed")) { log.info("Closing salesforce bulk job connection"); - this.bulkConnection.closeJob(this.bulkJob.getId()); + this.bulkConnection.closeJob(this.getBulkJobId()); } } - public static List constructGetCommand(String restQuery) { + private static List constructGetCommand(String restQuery) { return Arrays.asList(new RestApiCommand().build(Arrays.asList(restQuery), RestApiCommandType.GET)); } /** * Waits for the PK batches to complete. The wait will stop after all batches are complete or on the first failed batch * @param batchInfoList list of batch info - * @param retryInterval the polling interval + * @param waitInterval the polling interval * @return the last {@link BatchInfo} processed * @throws InterruptedException * @throws AsyncApiException */ - private BatchInfo waitForPkBatches(BatchInfoList batchInfoList, int retryInterval) + private BatchInfo waitForPkBatches(BatchInfoList batchInfoList, int waitInterval) throws InterruptedException, AsyncApiException { BatchInfo batchInfo = null; BatchInfo[] batchInfos = batchInfoList.getBatchInfo(); -// Wait for all batches other than the first one. The first one is not processed in PK chunking mode -for (int i = 1; i < batchInfos.length; i++) { - BatchInfo bi = batchInfos[i]; - - // get refreshed job status - bi = this.bulkConnection.getBatchInfo(this.bulkJob.getId(), bi.getId()); - - while ((bi.getState() != BatchStateEnum.Completed) - && (bi.getState() != BatchStateEnum.Failed)) { -Thread.sleep(retryInterval * 1000); +for (BatchInfo bi: batchInfos) { Review comment: I had a ticket for this - https://jira01.corp.linkedin.com:8443/browse/DSS-1 pkchunking is using this function. Sometimes, the parent may not be the first element in the list. (see screenshot in ticket) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-gobblin] arekusuri commented on a change in pull request #2722: GOBBLIN-865: Add feature that enables PK-chunking in partition
arekusuri commented on a change in pull request #2722: GOBBLIN-865: Add feature that enables PK-chunking in partition URL: https://github.com/apache/incubator-gobblin/pull/2722#discussion_r325344285 ## File path: gobblin-salesforce/src/main/java/org/apache/gobblin/salesforce/SalesforceExtractor.java ## @@ -779,7 +880,7 @@ public boolean bulkApiLogin() throws Exception { BatchInfoList batchInfoList = this.bulkConnection.getBatchInfoList(this.bulkJob.getId()); if (usingPkChunking && bulkBatchInfo.getState() == BatchStateEnum.NotProcessed) { Review comment: Thanks! removed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-gobblin] arekusuri commented on a change in pull request #2722: GOBBLIN-865: Add feature that enables PK-chunking in partition
arekusuri commented on a change in pull request #2722: GOBBLIN-865: Add feature that enables PK-chunking in partition URL: https://github.com/apache/incubator-gobblin/pull/2722#discussion_r325354169 ## File path: gobblin-salesforce/src/main/java/org/apache/gobblin/salesforce/SalesforceExtractor.java ## @@ -979,39 +1118,33 @@ private void fetchResultBatchWithRetry(RecordSetList rs) @Override public void closeConnection() throws Exception { if (this.bulkConnection != null -&& !this.bulkConnection.getJobStatus(this.bulkJob.getId()).getState().toString().equals("Closed")) { +&& !this.bulkConnection.getJobStatus(this.getBulkJobId()).getState().toString().equals("Closed")) { log.info("Closing salesforce bulk job connection"); - this.bulkConnection.closeJob(this.bulkJob.getId()); + this.bulkConnection.closeJob(this.getBulkJobId()); } } - public static List constructGetCommand(String restQuery) { + private static List constructGetCommand(String restQuery) { return Arrays.asList(new RestApiCommand().build(Arrays.asList(restQuery), RestApiCommandType.GET)); } /** * Waits for the PK batches to complete. The wait will stop after all batches are complete or on the first failed batch * @param batchInfoList list of batch info - * @param retryInterval the polling interval + * @param waitInterval the polling interval * @return the last {@link BatchInfo} processed * @throws InterruptedException * @throws AsyncApiException */ - private BatchInfo waitForPkBatches(BatchInfoList batchInfoList, int retryInterval) + private BatchInfo waitForPkBatches(BatchInfoList batchInfoList, int waitInterval) throws InterruptedException, AsyncApiException { BatchInfo batchInfo = null; BatchInfo[] batchInfos = batchInfoList.getBatchInfo(); -// Wait for all batches other than the first one. The first one is not processed in PK chunking mode -for (int i = 1; i < batchInfos.length; i++) { - BatchInfo bi = batchInfos[i]; - - // get refreshed job status - bi = this.bulkConnection.getBatchInfo(this.bulkJob.getId(), bi.getId()); - - while ((bi.getState() != BatchStateEnum.Completed) - && (bi.getState() != BatchStateEnum.Failed)) { -Thread.sleep(retryInterval * 1000); +for (BatchInfo bi: batchInfos) { + BatchStateEnum state = bi.getState(); + while (state != BatchStateEnum.Completed && state != BatchStateEnum.Failed && state != BatchStateEnum.NotProcessed) { +Thread.sleep(waitInterval * 1000); bi = this.bulkConnection.getBatchInfo(this.bulkJob.getId(), bi.getId()); Review comment: Good catch! Thanks! I should not use the state variable. My test worked, because I was using break point, the time was enough to let sfdc execute. I did more refactoring for this part. Pushing code advance. Will do test during today. and update you. Thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-gobblin] arekusuri commented on a change in pull request #2722: GOBBLIN-865: Add feature that enables PK-chunking in partition
arekusuri commented on a change in pull request #2722: GOBBLIN-865: Add feature that enables PK-chunking in partition URL: https://github.com/apache/incubator-gobblin/pull/2722#discussion_r325343123 ## File path: gobblin-salesforce/src/main/java/org/apache/gobblin/salesforce/SalesforceExtractor.java ## @@ -779,7 +880,7 @@ public boolean bulkApiLogin() throws Exception { BatchInfoList batchInfoList = this.bulkConnection.getBatchInfoList(this.bulkJob.getId()); Review comment: >> BatchInfoList batchInfoList = this.bulkConnection.getBatchInfoList(this.bulkJob.getId()); I try not to change the existing code in **getQueryResultIds**. And this line is necessary. **getBatchInfoList** function's return value is **BatchIdAndResultId** which means a list of resultIds with corresponded batch Ids. One bulkJobId could have multiple results even though it is not a pk-chunking. I am not use **getQueryResultIds** for pkchunking. If we want to do further code refactoring, let's do it in another ticket. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-gobblin] menarguez commented on issue #2734: GOBBLIN-880 Bump CouchbaseWriter Couchbase SDK version + write docs +…
menarguez commented on issue #2734: GOBBLIN-880 Bump CouchbaseWriter Couchbase SDK version + write docs +… URL: https://github.com/apache/incubator-gobblin/pull/2734#issuecomment-532290703 @yukuai518 I bumped couchbase version as to introduce cert based authentication as shown in Couchbase JAVA SDK docs in https://docs.couchbase.com/java-sdk/2.7/sdk-authentication-overview.html#authenticating-a-java-client-by-certificate with the com.couchbase.client.java.auth.CertAuthenticator . Something I just realized is that couchbase also now supports RBAC auth that includes username in the connection string. See [example](https://docs.couchbase.com/java-sdk/2.7/start-using-sdk.html). This should also be implemented in a separate PR This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Work logged] (GOBBLIN-865) Add feature that enables PK-chunking in partition
[ https://issues.apache.org/jira/browse/GOBBLIN-865?focusedWorklogId=313806=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-313806 ] ASF GitHub Bot logged work on GOBBLIN-865: -- Author: ASF GitHub Bot Created on: 17/Sep/19 17:08 Start Date: 17/Sep/19 17:08 Worklog Time Spent: 10m Work Description: htran1 commented on pull request #2722: GOBBLIN-865: Add feature that enables PK-chunking in partition URL: https://github.com/apache/incubator-gobblin/pull/2722#discussion_r325283088 ## File path: gobblin-salesforce/src/main/java/org/apache/gobblin/salesforce/SalesforceExtractor.java ## @@ -979,39 +1118,33 @@ private void fetchResultBatchWithRetry(RecordSetList rs) @Override public void closeConnection() throws Exception { if (this.bulkConnection != null -&& !this.bulkConnection.getJobStatus(this.bulkJob.getId()).getState().toString().equals("Closed")) { +&& !this.bulkConnection.getJobStatus(this.getBulkJobId()).getState().toString().equals("Closed")) { log.info("Closing salesforce bulk job connection"); - this.bulkConnection.closeJob(this.bulkJob.getId()); + this.bulkConnection.closeJob(this.getBulkJobId()); } } - public static List constructGetCommand(String restQuery) { + private static List constructGetCommand(String restQuery) { return Arrays.asList(new RestApiCommand().build(Arrays.asList(restQuery), RestApiCommandType.GET)); } /** * Waits for the PK batches to complete. The wait will stop after all batches are complete or on the first failed batch * @param batchInfoList list of batch info - * @param retryInterval the polling interval + * @param waitInterval the polling interval * @return the last {@link BatchInfo} processed * @throws InterruptedException * @throws AsyncApiException */ - private BatchInfo waitForPkBatches(BatchInfoList batchInfoList, int retryInterval) + private BatchInfo waitForPkBatches(BatchInfoList batchInfoList, int waitInterval) throws InterruptedException, AsyncApiException { BatchInfo batchInfo = null; BatchInfo[] batchInfos = batchInfoList.getBatchInfo(); -// Wait for all batches other than the first one. The first one is not processed in PK chunking mode -for (int i = 1; i < batchInfos.length; i++) { - BatchInfo bi = batchInfos[i]; - - // get refreshed job status - bi = this.bulkConnection.getBatchInfo(this.bulkJob.getId(), bi.getId()); - - while ((bi.getState() != BatchStateEnum.Completed) - && (bi.getState() != BatchStateEnum.Failed)) { -Thread.sleep(retryInterval * 1000); +for (BatchInfo bi: batchInfos) { Review comment: Is it no longer true that the first BatchInfo is for the parent and not the pk chunking batches or are you considering it safe to apply the same check since the state should be completed? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 313806) Time Spent: 5h 20m (was: 5h 10m) > Add feature that enables PK-chunking in partition > -- > > Key: GOBBLIN-865 > URL: https://issues.apache.org/jira/browse/GOBBLIN-865 > Project: Apache Gobblin > Issue Type: Task >Reporter: Alex Li >Priority: Major > Labels: salesforce > Time Spent: 5h 20m > Remaining Estimate: 0h > > In SFDC(salesforce) connector, we have partitioning mechanisms to split a > giant query to multiple sub queries. There are 3 mechanisms: > * simple partition (equally split by time) > * dynamic pre-partition (generate histogram and split by row numbers) > * user specified partition (set up time range in job file) > However there are tables like Task and Contract are failing time to time to > fetch full data. > We may want to utilize PK-chunking to partition the query. > > The pk-chunking doc from SFDC - > [https://developer.salesforce.com/docs/atlas.en-us.api_asynch.meta/api_asynch/async_api_headers_enable_pk_chunking.htm] -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (GOBBLIN-865) Add feature that enables PK-chunking in partition
[ https://issues.apache.org/jira/browse/GOBBLIN-865?focusedWorklogId=313807=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-313807 ] ASF GitHub Bot logged work on GOBBLIN-865: -- Author: ASF GitHub Bot Created on: 17/Sep/19 17:08 Start Date: 17/Sep/19 17:08 Worklog Time Spent: 10m Work Description: htran1 commented on pull request #2722: GOBBLIN-865: Add feature that enables PK-chunking in partition URL: https://github.com/apache/incubator-gobblin/pull/2722#discussion_r325267416 ## File path: gobblin-salesforce/src/main/java/org/apache/gobblin/salesforce/SalesforceExtractor.java ## @@ -779,7 +880,7 @@ public boolean bulkApiLogin() throws Exception { BatchInfoList batchInfoList = this.bulkConnection.getBatchInfoList(this.bulkJob.getId()); if (usingPkChunking && bulkBatchInfo.getState() == BatchStateEnum.NotProcessed) { Review comment: Remove this block too since usingPkChunking is no longer set. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 313807) Time Spent: 5.5h (was: 5h 20m) > Add feature that enables PK-chunking in partition > -- > > Key: GOBBLIN-865 > URL: https://issues.apache.org/jira/browse/GOBBLIN-865 > Project: Apache Gobblin > Issue Type: Task >Reporter: Alex Li >Priority: Major > Labels: salesforce > Time Spent: 5.5h > Remaining Estimate: 0h > > In SFDC(salesforce) connector, we have partitioning mechanisms to split a > giant query to multiple sub queries. There are 3 mechanisms: > * simple partition (equally split by time) > * dynamic pre-partition (generate histogram and split by row numbers) > * user specified partition (set up time range in job file) > However there are tables like Task and Contract are failing time to time to > fetch full data. > We may want to utilize PK-chunking to partition the query. > > The pk-chunking doc from SFDC - > [https://developer.salesforce.com/docs/atlas.en-us.api_asynch.meta/api_asynch/async_api_headers_enable_pk_chunking.htm] -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (GOBBLIN-865) Add feature that enables PK-chunking in partition
[ https://issues.apache.org/jira/browse/GOBBLIN-865?focusedWorklogId=313803=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-313803 ] ASF GitHub Bot logged work on GOBBLIN-865: -- Author: ASF GitHub Bot Created on: 17/Sep/19 17:08 Start Date: 17/Sep/19 17:08 Worklog Time Spent: 10m Work Description: htran1 commented on pull request #2722: GOBBLIN-865: Add feature that enables PK-chunking in partition URL: https://github.com/apache/incubator-gobblin/pull/2722#discussion_r325284644 ## File path: gobblin-salesforce/src/main/java/org/apache/gobblin/salesforce/SalesforceExtractor.java ## @@ -144,28 +142,18 @@ private final boolean pkChunkingSkipCountCheck; private final boolean bulkApiUseQueryAll; + private WorkUnitState workUnitState; + public SalesforceExtractor(WorkUnitState state) { super(state); -this.sfConnector = (SalesforceConnector) this.connector; - -// don't allow pk chunking if max partitions too high or have user specified partitions -if (state.getPropAsBoolean(Partitioner.HAS_USER_SPECIFIED_PARTITIONS, false) Review comment: How come this check was removed? If the user has specified partitions then that should override PK chunking. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 313803) Time Spent: 5h 10m (was: 5h) > Add feature that enables PK-chunking in partition > -- > > Key: GOBBLIN-865 > URL: https://issues.apache.org/jira/browse/GOBBLIN-865 > Project: Apache Gobblin > Issue Type: Task >Reporter: Alex Li >Priority: Major > Labels: salesforce > Time Spent: 5h 10m > Remaining Estimate: 0h > > In SFDC(salesforce) connector, we have partitioning mechanisms to split a > giant query to multiple sub queries. There are 3 mechanisms: > * simple partition (equally split by time) > * dynamic pre-partition (generate histogram and split by row numbers) > * user specified partition (set up time range in job file) > However there are tables like Task and Contract are failing time to time to > fetch full data. > We may want to utilize PK-chunking to partition the query. > > The pk-chunking doc from SFDC - > [https://developer.salesforce.com/docs/atlas.en-us.api_asynch.meta/api_asynch/async_api_headers_enable_pk_chunking.htm] -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (GOBBLIN-865) Add feature that enables PK-chunking in partition
[ https://issues.apache.org/jira/browse/GOBBLIN-865?focusedWorklogId=313805=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-313805 ] ASF GitHub Bot logged work on GOBBLIN-865: -- Author: ASF GitHub Bot Created on: 17/Sep/19 17:08 Start Date: 17/Sep/19 17:08 Worklog Time Spent: 10m Work Description: htran1 commented on pull request #2722: GOBBLIN-865: Add feature that enables PK-chunking in partition URL: https://github.com/apache/incubator-gobblin/pull/2722#discussion_r325264698 ## File path: gobblin-salesforce/src/main/java/org/apache/gobblin/salesforce/SalesforceExtractor.java ## @@ -144,28 +142,18 @@ private final boolean pkChunkingSkipCountCheck; private final boolean bulkApiUseQueryAll; + private WorkUnitState workUnitState; + public SalesforceExtractor(WorkUnitState state) { super(state); -this.sfConnector = (SalesforceConnector) this.connector; - -// don't allow pk chunking if max partitions too high or have user specified partitions -if (state.getPropAsBoolean(Partitioner.HAS_USER_SPECIFIED_PARTITIONS, false) -|| state.getPropAsInt(ConfigurationKeys.SOURCE_MAX_NUMBER_OF_PARTITIONS, -ConfigurationKeys.DEFAULT_MAX_NUMBER_OF_PARTITIONS) > PK_CHUNKING_MAX_PARTITIONS_LIMIT) { - if (state.getPropAsBoolean(ENABLE_PK_CHUNKING_KEY, false)) { -log.warn("Max partitions too high, so PK chunking is not enabled"); - } - - this.pkChunking = false; -} else { - this.pkChunking = state.getPropAsBoolean(ENABLE_PK_CHUNKING_KEY, false); -} +this.workUnitState = state; +this.sfConnector = (SalesforceConnector) this.connector; this.pkChunkingSize = Math.max(MIN_PK_CHUNKING_SIZE, -Math.min(MAX_PK_CHUNKING_SIZE, state.getPropAsInt(PK_CHUNKING_SIZE_KEY, DEFAULT_PK_CHUNKING_SIZE))); +Math.min(MAX_PK_CHUNKING_SIZE, state.getPropAsInt(PARTITION_PK_CHUNKING_SIZE, DEFAULT_PK_CHUNKING_SIZE))); -this.pkChunkingSkipCountCheck = state.getPropAsBoolean(PK_CHUNKING_SKIP_COUNT_CHECK, DEFAULT_PK_CHUNKING_SKIP_COUNT_CHECK); +this.pkChunkingSkipCountCheck = true;// won't be able to get count Review comment: Should remove this since the config is no longer used. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 313805) Time Spent: 5h 10m (was: 5h) > Add feature that enables PK-chunking in partition > -- > > Key: GOBBLIN-865 > URL: https://issues.apache.org/jira/browse/GOBBLIN-865 > Project: Apache Gobblin > Issue Type: Task >Reporter: Alex Li >Priority: Major > Labels: salesforce > Time Spent: 5h 10m > Remaining Estimate: 0h > > In SFDC(salesforce) connector, we have partitioning mechanisms to split a > giant query to multiple sub queries. There are 3 mechanisms: > * simple partition (equally split by time) > * dynamic pre-partition (generate histogram and split by row numbers) > * user specified partition (set up time range in job file) > However there are tables like Task and Contract are failing time to time to > fetch full data. > We may want to utilize PK-chunking to partition the query. > > The pk-chunking doc from SFDC - > [https://developer.salesforce.com/docs/atlas.en-us.api_asynch.meta/api_asynch/async_api_headers_enable_pk_chunking.htm] -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (GOBBLIN-865) Add feature that enables PK-chunking in partition
[ https://issues.apache.org/jira/browse/GOBBLIN-865?focusedWorklogId=313802=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-313802 ] ASF GitHub Bot logged work on GOBBLIN-865: -- Author: ASF GitHub Bot Created on: 17/Sep/19 17:08 Start Date: 17/Sep/19 17:08 Worklog Time Spent: 10m Work Description: htran1 commented on pull request #2722: GOBBLIN-865: Add feature that enables PK-chunking in partition URL: https://github.com/apache/incubator-gobblin/pull/2722#discussion_r325283786 ## File path: gobblin-salesforce/src/main/java/org/apache/gobblin/salesforce/SalesforceExtractor.java ## @@ -979,39 +1118,33 @@ private void fetchResultBatchWithRetry(RecordSetList rs) @Override public void closeConnection() throws Exception { if (this.bulkConnection != null -&& !this.bulkConnection.getJobStatus(this.bulkJob.getId()).getState().toString().equals("Closed")) { +&& !this.bulkConnection.getJobStatus(this.getBulkJobId()).getState().toString().equals("Closed")) { log.info("Closing salesforce bulk job connection"); - this.bulkConnection.closeJob(this.bulkJob.getId()); + this.bulkConnection.closeJob(this.getBulkJobId()); } } - public static List constructGetCommand(String restQuery) { + private static List constructGetCommand(String restQuery) { return Arrays.asList(new RestApiCommand().build(Arrays.asList(restQuery), RestApiCommandType.GET)); } /** * Waits for the PK batches to complete. The wait will stop after all batches are complete or on the first failed batch * @param batchInfoList list of batch info - * @param retryInterval the polling interval + * @param waitInterval the polling interval * @return the last {@link BatchInfo} processed * @throws InterruptedException * @throws AsyncApiException */ - private BatchInfo waitForPkBatches(BatchInfoList batchInfoList, int retryInterval) + private BatchInfo waitForPkBatches(BatchInfoList batchInfoList, int waitInterval) throws InterruptedException, AsyncApiException { BatchInfo batchInfo = null; BatchInfo[] batchInfos = batchInfoList.getBatchInfo(); -// Wait for all batches other than the first one. The first one is not processed in PK chunking mode -for (int i = 1; i < batchInfos.length; i++) { - BatchInfo bi = batchInfos[i]; - - // get refreshed job status - bi = this.bulkConnection.getBatchInfo(this.bulkJob.getId(), bi.getId()); - - while ((bi.getState() != BatchStateEnum.Completed) - && (bi.getState() != BatchStateEnum.Failed)) { -Thread.sleep(retryInterval * 1000); +for (BatchInfo bi: batchInfos) { + BatchStateEnum state = bi.getState(); + while (state != BatchStateEnum.Completed && state != BatchStateEnum.Failed && state != BatchStateEnum.NotProcessed) { +Thread.sleep(waitInterval * 1000); bi = this.bulkConnection.getBatchInfo(this.bulkJob.getId(), bi.getId()); Review comment: You are assigning to the outside loop variable. Not sure if this is intentional. Also, I don't see state being updated, so won't this loop forever? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 313802) Time Spent: 5h (was: 4h 50m) > Add feature that enables PK-chunking in partition > -- > > Key: GOBBLIN-865 > URL: https://issues.apache.org/jira/browse/GOBBLIN-865 > Project: Apache Gobblin > Issue Type: Task >Reporter: Alex Li >Priority: Major > Labels: salesforce > Time Spent: 5h > Remaining Estimate: 0h > > In SFDC(salesforce) connector, we have partitioning mechanisms to split a > giant query to multiple sub queries. There are 3 mechanisms: > * simple partition (equally split by time) > * dynamic pre-partition (generate histogram and split by row numbers) > * user specified partition (set up time range in job file) > However there are tables like Task and Contract are failing time to time to > fetch full data. > We may want to utilize PK-chunking to partition the query. > > The pk-chunking doc from SFDC - > [https://developer.salesforce.com/docs/atlas.en-us.api_asynch.meta/api_asynch/async_api_headers_enable_pk_chunking.htm] -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (GOBBLIN-865) Add feature that enables PK-chunking in partition
[ https://issues.apache.org/jira/browse/GOBBLIN-865?focusedWorklogId=313804=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-313804 ] ASF GitHub Bot logged work on GOBBLIN-865: -- Author: ASF GitHub Bot Created on: 17/Sep/19 17:08 Start Date: 17/Sep/19 17:08 Worklog Time Spent: 10m Work Description: htran1 commented on pull request #2722: GOBBLIN-865: Add feature that enables PK-chunking in partition URL: https://github.com/apache/incubator-gobblin/pull/2722#discussion_r325281910 ## File path: gobblin-salesforce/src/main/java/org/apache/gobblin/salesforce/SalesforceExtractor.java ## @@ -779,7 +880,7 @@ public boolean bulkApiLogin() throws Exception { BatchInfoList batchInfoList = this.bulkConnection.getBatchInfoList(this.bulkJob.getId()); Review comment: Remove this? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 313804) Time Spent: 5h 10m (was: 5h) > Add feature that enables PK-chunking in partition > -- > > Key: GOBBLIN-865 > URL: https://issues.apache.org/jira/browse/GOBBLIN-865 > Project: Apache Gobblin > Issue Type: Task >Reporter: Alex Li >Priority: Major > Labels: salesforce > Time Spent: 5h 10m > Remaining Estimate: 0h > > In SFDC(salesforce) connector, we have partitioning mechanisms to split a > giant query to multiple sub queries. There are 3 mechanisms: > * simple partition (equally split by time) > * dynamic pre-partition (generate histogram and split by row numbers) > * user specified partition (set up time range in job file) > However there are tables like Task and Contract are failing time to time to > fetch full data. > We may want to utilize PK-chunking to partition the query. > > The pk-chunking doc from SFDC - > [https://developer.salesforce.com/docs/atlas.en-us.api_asynch.meta/api_asynch/async_api_headers_enable_pk_chunking.htm] -- This message was sent by Atlassian Jira (v8.3.2#803003)
[GitHub] [incubator-gobblin] htran1 commented on a change in pull request #2722: GOBBLIN-865: Add feature that enables PK-chunking in partition
htran1 commented on a change in pull request #2722: GOBBLIN-865: Add feature that enables PK-chunking in partition URL: https://github.com/apache/incubator-gobblin/pull/2722#discussion_r325283088 ## File path: gobblin-salesforce/src/main/java/org/apache/gobblin/salesforce/SalesforceExtractor.java ## @@ -979,39 +1118,33 @@ private void fetchResultBatchWithRetry(RecordSetList rs) @Override public void closeConnection() throws Exception { if (this.bulkConnection != null -&& !this.bulkConnection.getJobStatus(this.bulkJob.getId()).getState().toString().equals("Closed")) { +&& !this.bulkConnection.getJobStatus(this.getBulkJobId()).getState().toString().equals("Closed")) { log.info("Closing salesforce bulk job connection"); - this.bulkConnection.closeJob(this.bulkJob.getId()); + this.bulkConnection.closeJob(this.getBulkJobId()); } } - public static List constructGetCommand(String restQuery) { + private static List constructGetCommand(String restQuery) { return Arrays.asList(new RestApiCommand().build(Arrays.asList(restQuery), RestApiCommandType.GET)); } /** * Waits for the PK batches to complete. The wait will stop after all batches are complete or on the first failed batch * @param batchInfoList list of batch info - * @param retryInterval the polling interval + * @param waitInterval the polling interval * @return the last {@link BatchInfo} processed * @throws InterruptedException * @throws AsyncApiException */ - private BatchInfo waitForPkBatches(BatchInfoList batchInfoList, int retryInterval) + private BatchInfo waitForPkBatches(BatchInfoList batchInfoList, int waitInterval) throws InterruptedException, AsyncApiException { BatchInfo batchInfo = null; BatchInfo[] batchInfos = batchInfoList.getBatchInfo(); -// Wait for all batches other than the first one. The first one is not processed in PK chunking mode -for (int i = 1; i < batchInfos.length; i++) { - BatchInfo bi = batchInfos[i]; - - // get refreshed job status - bi = this.bulkConnection.getBatchInfo(this.bulkJob.getId(), bi.getId()); - - while ((bi.getState() != BatchStateEnum.Completed) - && (bi.getState() != BatchStateEnum.Failed)) { -Thread.sleep(retryInterval * 1000); +for (BatchInfo bi: batchInfos) { Review comment: Is it no longer true that the first BatchInfo is for the parent and not the pk chunking batches or are you considering it safe to apply the same check since the state should be completed? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-gobblin] htran1 commented on a change in pull request #2722: GOBBLIN-865: Add feature that enables PK-chunking in partition
htran1 commented on a change in pull request #2722: GOBBLIN-865: Add feature that enables PK-chunking in partition URL: https://github.com/apache/incubator-gobblin/pull/2722#discussion_r325281910 ## File path: gobblin-salesforce/src/main/java/org/apache/gobblin/salesforce/SalesforceExtractor.java ## @@ -779,7 +880,7 @@ public boolean bulkApiLogin() throws Exception { BatchInfoList batchInfoList = this.bulkConnection.getBatchInfoList(this.bulkJob.getId()); Review comment: Remove this? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-gobblin] htran1 commented on a change in pull request #2722: GOBBLIN-865: Add feature that enables PK-chunking in partition
htran1 commented on a change in pull request #2722: GOBBLIN-865: Add feature that enables PK-chunking in partition URL: https://github.com/apache/incubator-gobblin/pull/2722#discussion_r325264698 ## File path: gobblin-salesforce/src/main/java/org/apache/gobblin/salesforce/SalesforceExtractor.java ## @@ -144,28 +142,18 @@ private final boolean pkChunkingSkipCountCheck; private final boolean bulkApiUseQueryAll; + private WorkUnitState workUnitState; + public SalesforceExtractor(WorkUnitState state) { super(state); -this.sfConnector = (SalesforceConnector) this.connector; - -// don't allow pk chunking if max partitions too high or have user specified partitions -if (state.getPropAsBoolean(Partitioner.HAS_USER_SPECIFIED_PARTITIONS, false) -|| state.getPropAsInt(ConfigurationKeys.SOURCE_MAX_NUMBER_OF_PARTITIONS, -ConfigurationKeys.DEFAULT_MAX_NUMBER_OF_PARTITIONS) > PK_CHUNKING_MAX_PARTITIONS_LIMIT) { - if (state.getPropAsBoolean(ENABLE_PK_CHUNKING_KEY, false)) { -log.warn("Max partitions too high, so PK chunking is not enabled"); - } - - this.pkChunking = false; -} else { - this.pkChunking = state.getPropAsBoolean(ENABLE_PK_CHUNKING_KEY, false); -} +this.workUnitState = state; +this.sfConnector = (SalesforceConnector) this.connector; this.pkChunkingSize = Math.max(MIN_PK_CHUNKING_SIZE, -Math.min(MAX_PK_CHUNKING_SIZE, state.getPropAsInt(PK_CHUNKING_SIZE_KEY, DEFAULT_PK_CHUNKING_SIZE))); +Math.min(MAX_PK_CHUNKING_SIZE, state.getPropAsInt(PARTITION_PK_CHUNKING_SIZE, DEFAULT_PK_CHUNKING_SIZE))); -this.pkChunkingSkipCountCheck = state.getPropAsBoolean(PK_CHUNKING_SKIP_COUNT_CHECK, DEFAULT_PK_CHUNKING_SKIP_COUNT_CHECK); +this.pkChunkingSkipCountCheck = true;// won't be able to get count Review comment: Should remove this since the config is no longer used. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-gobblin] htran1 commented on a change in pull request #2722: GOBBLIN-865: Add feature that enables PK-chunking in partition
htran1 commented on a change in pull request #2722: GOBBLIN-865: Add feature that enables PK-chunking in partition URL: https://github.com/apache/incubator-gobblin/pull/2722#discussion_r325283786 ## File path: gobblin-salesforce/src/main/java/org/apache/gobblin/salesforce/SalesforceExtractor.java ## @@ -979,39 +1118,33 @@ private void fetchResultBatchWithRetry(RecordSetList rs) @Override public void closeConnection() throws Exception { if (this.bulkConnection != null -&& !this.bulkConnection.getJobStatus(this.bulkJob.getId()).getState().toString().equals("Closed")) { +&& !this.bulkConnection.getJobStatus(this.getBulkJobId()).getState().toString().equals("Closed")) { log.info("Closing salesforce bulk job connection"); - this.bulkConnection.closeJob(this.bulkJob.getId()); + this.bulkConnection.closeJob(this.getBulkJobId()); } } - public static List constructGetCommand(String restQuery) { + private static List constructGetCommand(String restQuery) { return Arrays.asList(new RestApiCommand().build(Arrays.asList(restQuery), RestApiCommandType.GET)); } /** * Waits for the PK batches to complete. The wait will stop after all batches are complete or on the first failed batch * @param batchInfoList list of batch info - * @param retryInterval the polling interval + * @param waitInterval the polling interval * @return the last {@link BatchInfo} processed * @throws InterruptedException * @throws AsyncApiException */ - private BatchInfo waitForPkBatches(BatchInfoList batchInfoList, int retryInterval) + private BatchInfo waitForPkBatches(BatchInfoList batchInfoList, int waitInterval) throws InterruptedException, AsyncApiException { BatchInfo batchInfo = null; BatchInfo[] batchInfos = batchInfoList.getBatchInfo(); -// Wait for all batches other than the first one. The first one is not processed in PK chunking mode -for (int i = 1; i < batchInfos.length; i++) { - BatchInfo bi = batchInfos[i]; - - // get refreshed job status - bi = this.bulkConnection.getBatchInfo(this.bulkJob.getId(), bi.getId()); - - while ((bi.getState() != BatchStateEnum.Completed) - && (bi.getState() != BatchStateEnum.Failed)) { -Thread.sleep(retryInterval * 1000); +for (BatchInfo bi: batchInfos) { + BatchStateEnum state = bi.getState(); + while (state != BatchStateEnum.Completed && state != BatchStateEnum.Failed && state != BatchStateEnum.NotProcessed) { +Thread.sleep(waitInterval * 1000); bi = this.bulkConnection.getBatchInfo(this.bulkJob.getId(), bi.getId()); Review comment: You are assigning to the outside loop variable. Not sure if this is intentional. Also, I don't see state being updated, so won't this loop forever? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-gobblin] htran1 commented on a change in pull request #2722: GOBBLIN-865: Add feature that enables PK-chunking in partition
htran1 commented on a change in pull request #2722: GOBBLIN-865: Add feature that enables PK-chunking in partition URL: https://github.com/apache/incubator-gobblin/pull/2722#discussion_r325267416 ## File path: gobblin-salesforce/src/main/java/org/apache/gobblin/salesforce/SalesforceExtractor.java ## @@ -779,7 +880,7 @@ public boolean bulkApiLogin() throws Exception { BatchInfoList batchInfoList = this.bulkConnection.getBatchInfoList(this.bulkJob.getId()); if (usingPkChunking && bulkBatchInfo.getState() == BatchStateEnum.NotProcessed) { Review comment: Remove this block too since usingPkChunking is no longer set. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-gobblin] htran1 commented on a change in pull request #2722: GOBBLIN-865: Add feature that enables PK-chunking in partition
htran1 commented on a change in pull request #2722: GOBBLIN-865: Add feature that enables PK-chunking in partition URL: https://github.com/apache/incubator-gobblin/pull/2722#discussion_r325284644 ## File path: gobblin-salesforce/src/main/java/org/apache/gobblin/salesforce/SalesforceExtractor.java ## @@ -144,28 +142,18 @@ private final boolean pkChunkingSkipCountCheck; private final boolean bulkApiUseQueryAll; + private WorkUnitState workUnitState; + public SalesforceExtractor(WorkUnitState state) { super(state); -this.sfConnector = (SalesforceConnector) this.connector; - -// don't allow pk chunking if max partitions too high or have user specified partitions -if (state.getPropAsBoolean(Partitioner.HAS_USER_SPECIFIED_PARTITIONS, false) Review comment: How come this check was removed? If the user has specified partitions then that should override PK chunking. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-gobblin] shirshanka commented on a change in pull request #2730: GOBBLIN-876: Expose metrics() API in GobblinKafkaConsumerClient to al…
shirshanka commented on a change in pull request #2730: GOBBLIN-876: Expose metrics() API in GobblinKafkaConsumerClient to al… URL: https://github.com/apache/incubator-gobblin/pull/2730#discussion_r325302830 ## File path: gobblin-modules/gobblin-kafka-09/src/main/java/org/apache/gobblin/kafka/client/Kafka09ConsumerClient.java ## @@ -162,6 +170,35 @@ public KafkaConsumerRecord apply(ConsumerRecord input) { }); } + @Override + public Map metrics() { +Map kafkaMetrics = (Map) this.consumer.metrics(); +Map codaHaleMetricMap = new HashMap<>(); Review comment: do we need a new instance every time? This diff doesn't show when and how this method is being called, so its hard to say if this new instance and copy on each call to metrics is a good idea or not. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Work logged] (GOBBLIN-876) Expose metrics() API in GobblinKafkaConsumerClient to allow consume metrics to be reported
[ https://issues.apache.org/jira/browse/GOBBLIN-876?focusedWorklogId=313828=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-313828 ] ASF GitHub Bot logged work on GOBBLIN-876: -- Author: ASF GitHub Bot Created on: 17/Sep/19 17:49 Start Date: 17/Sep/19 17:49 Worklog Time Spent: 10m Work Description: shirshanka commented on pull request #2730: GOBBLIN-876: Expose metrics() API in GobblinKafkaConsumerClient to al… URL: https://github.com/apache/incubator-gobblin/pull/2730#discussion_r325302830 ## File path: gobblin-modules/gobblin-kafka-09/src/main/java/org/apache/gobblin/kafka/client/Kafka09ConsumerClient.java ## @@ -162,6 +170,35 @@ public KafkaConsumerRecord apply(ConsumerRecord input) { }); } + @Override + public Map metrics() { +Map kafkaMetrics = (Map) this.consumer.metrics(); +Map codaHaleMetricMap = new HashMap<>(); Review comment: do we need a new instance every time? This diff doesn't show when and how this method is being called, so its hard to say if this new instance and copy on each call to metrics is a good idea or not. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 313828) Time Spent: 2h 50m (was: 2h 40m) > Expose metrics() API in GobblinKafkaConsumerClient to allow consume metrics > to be reported > -- > > Key: GOBBLIN-876 > URL: https://issues.apache.org/jira/browse/GOBBLIN-876 > Project: Apache Gobblin > Issue Type: Improvement > Components: gobblin-kafka >Affects Versions: 0.15.0 >Reporter: Sudarshan Vasudevan >Assignee: Shirshanka Das >Priority: Major > Fix For: 0.15.0 > > Time Spent: 2h 50m > Remaining Estimate: 0h > > Newer Kafka consumer expose metrics() API that report a number of consumer > metrics such as lag, latency, etc. which are very useful for monitoring and > debugging. We expose a metrics() API in GobblinKafkaConsumerClient to allow > consumer metrics to be reported. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[GitHub] [incubator-gobblin] shirshanka commented on a change in pull request #2730: GOBBLIN-876: Expose metrics() API in GobblinKafkaConsumerClient to al…
shirshanka commented on a change in pull request #2730: GOBBLIN-876: Expose metrics() API in GobblinKafkaConsumerClient to al… URL: https://github.com/apache/incubator-gobblin/pull/2730#discussion_r325301544 ## File path: gobblin-modules/gobblin-kafka-common/src/main/java/org/apache/gobblin/kafka/client/GobblinKafkaConsumerClient.java ## @@ -86,6 +89,15 @@ */ public Iterator consume(KafkaPartition partition, long nextOffset, long maxOffset); + /** + * API to return underlying Kafka consumer metrics. The individual implementations must translate + * org.apache.kafka.common.Metric to Coda Hale Metrics. + * @return + */ + public default Map metrics() { Review comment: since metrics is not a verb... maybe this method should be called getMetrics()? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Work logged] (GOBBLIN-876) Expose metrics() API in GobblinKafkaConsumerClient to allow consume metrics to be reported
[ https://issues.apache.org/jira/browse/GOBBLIN-876?focusedWorklogId=313820=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-313820 ] ASF GitHub Bot logged work on GOBBLIN-876: -- Author: ASF GitHub Bot Created on: 17/Sep/19 17:46 Start Date: 17/Sep/19 17:46 Worklog Time Spent: 10m Work Description: shirshanka commented on pull request #2730: GOBBLIN-876: Expose metrics() API in GobblinKafkaConsumerClient to al… URL: https://github.com/apache/incubator-gobblin/pull/2730#discussion_r325301544 ## File path: gobblin-modules/gobblin-kafka-common/src/main/java/org/apache/gobblin/kafka/client/GobblinKafkaConsumerClient.java ## @@ -86,6 +89,15 @@ */ public Iterator consume(KafkaPartition partition, long nextOffset, long maxOffset); + /** + * API to return underlying Kafka consumer metrics. The individual implementations must translate + * org.apache.kafka.common.Metric to Coda Hale Metrics. + * @return + */ + public default Map metrics() { Review comment: since metrics is not a verb... maybe this method should be called getMetrics()? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 313820) Time Spent: 2h 40m (was: 2.5h) > Expose metrics() API in GobblinKafkaConsumerClient to allow consume metrics > to be reported > -- > > Key: GOBBLIN-876 > URL: https://issues.apache.org/jira/browse/GOBBLIN-876 > Project: Apache Gobblin > Issue Type: Improvement > Components: gobblin-kafka >Affects Versions: 0.15.0 >Reporter: Sudarshan Vasudevan >Assignee: Shirshanka Das >Priority: Major > Fix For: 0.15.0 > > Time Spent: 2h 40m > Remaining Estimate: 0h > > Newer Kafka consumer expose metrics() API that report a number of consumer > metrics such as lag, latency, etc. which are very useful for monitoring and > debugging. We expose a metrics() API in GobblinKafkaConsumerClient to allow > consumer metrics to be reported. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (GOBBLIN-880) Bump CouchbaseWriter Couchbase SDK version + write docs + cert based auth + enable TTL + dnsSrv
[ https://issues.apache.org/jira/browse/GOBBLIN-880?focusedWorklogId=313772=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-313772 ] ASF GitHub Bot logged work on GOBBLIN-880: -- Author: ASF GitHub Bot Created on: 17/Sep/19 16:11 Start Date: 17/Sep/19 16:11 Worklog Time Spent: 10m Work Description: menarguez commented on issue #2734: GOBBLIN-880 Bump CouchbaseWriter Couchbase SDK version + write docs +… URL: https://github.com/apache/incubator-gobblin/pull/2734#issuecomment-532290703 @yukuai518 I bumped couchbase version as to introduce cert based authentication as shown in Couchbase JAVA SDK docs in https://docs.couchbase.com/java-sdk/2.7/sdk-authentication-overview.html#authenticating-a-java-client-by-certificate with the com.couchbase.client.java.auth.CertAuthenticator . Something I just realized is that couchbase also now supports RBAC auth that includes username in the connection string. See [example](https://docs.couchbase.com/java-sdk/2.7/start-using-sdk.html). This should also be implemented in a separate PR This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 313772) Remaining Estimate: 167h 10m (was: 167h 20m) Time Spent: 50m (was: 40m) > Bump CouchbaseWriter Couchbase SDK version + write docs + cert based auth + > enable TTL + dnsSrv > --- > > Key: GOBBLIN-880 > URL: https://issues.apache.org/jira/browse/GOBBLIN-880 > Project: Apache Gobblin > Issue Type: Improvement > Components: gobblin-couchbase >Reporter: Michael A Menarguez >Assignee: Shirshanka Das >Priority: Major > Labels: Couchbase > Fix For: 0.15.0 > > Original Estimate: 168h > Time Spent: 50m > Remaining Estimate: 167h 10m > > h1. h1. CURRENT ISSUES > Currently CouchbaseWriter.java lacks the ability to do the following: > # Use certificate based authentication > # Set document expiry (TTL) > ** based on write time > ** based on an offset specified field contained in the record's data (JSON) > ** (WILL NOT ADRESS) set expiry based on a field contained in the record's > data > # Set DNS SRV for bootstrap host discovery setting > # Missing documentation on CouchbaseWriter usage > # Testing does not bring in CouchbaseMock correctly and causes problems > while bumping com.couchbase.client:java-client > h1. h1. PROPOSED SOLUTIONS > # Add logic to connect using certificate based auth to the buckets (Will > need to bump com.couchbase.client:java-client to a newer version like 2.7.6) > and associated configs > # TTL implementation > ## Add configs to allow setting a TTL (documentTTL) and also specify the > timeunits (documentTTLUnits) of these settings > ## Add logic to specify the path to key to the field containing the source > timestamp (documentTTLOriginField) and its units (documentTTLOriginUnits) to > disambiguate between UNIX (sec) timestamps and other formats like timestamps > in milliseconds. > ## N/A but logic would be similar to (2) > # Add missing dnsSrv config > # Write proper documentation > # Bring in CouchbaseMock from Gradle and adapt existing unit tests. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (GOBBLIN-865) Add feature that enables PK-chunking in partition
[ https://issues.apache.org/jira/browse/GOBBLIN-865?focusedWorklogId=313879=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-313879 ] ASF GitHub Bot logged work on GOBBLIN-865: -- Author: ASF GitHub Bot Created on: 17/Sep/19 19:50 Start Date: 17/Sep/19 19:50 Worklog Time Spent: 10m Work Description: arekusuri commented on pull request #2722: GOBBLIN-865: Add feature that enables PK-chunking in partition URL: https://github.com/apache/incubator-gobblin/pull/2722#discussion_r325344285 ## File path: gobblin-salesforce/src/main/java/org/apache/gobblin/salesforce/SalesforceExtractor.java ## @@ -779,7 +880,7 @@ public boolean bulkApiLogin() throws Exception { BatchInfoList batchInfoList = this.bulkConnection.getBatchInfoList(this.bulkJob.getId()); if (usingPkChunking && bulkBatchInfo.getState() == BatchStateEnum.NotProcessed) { Review comment: Thanks! removed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 313879) Time Spent: 5h 50m (was: 5h 40m) > Add feature that enables PK-chunking in partition > -- > > Key: GOBBLIN-865 > URL: https://issues.apache.org/jira/browse/GOBBLIN-865 > Project: Apache Gobblin > Issue Type: Task >Reporter: Alex Li >Priority: Major > Labels: salesforce > Time Spent: 5h 50m > Remaining Estimate: 0h > > In SFDC(salesforce) connector, we have partitioning mechanisms to split a > giant query to multiple sub queries. There are 3 mechanisms: > * simple partition (equally split by time) > * dynamic pre-partition (generate histogram and split by row numbers) > * user specified partition (set up time range in job file) > However there are tables like Task and Contract are failing time to time to > fetch full data. > We may want to utilize PK-chunking to partition the query. > > The pk-chunking doc from SFDC - > [https://developer.salesforce.com/docs/atlas.en-us.api_asynch.meta/api_asynch/async_api_headers_enable_pk_chunking.htm] -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (GOBBLIN-865) Add feature that enables PK-chunking in partition
[ https://issues.apache.org/jira/browse/GOBBLIN-865?focusedWorklogId=313877=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-313877 ] ASF GitHub Bot logged work on GOBBLIN-865: -- Author: ASF GitHub Bot Created on: 17/Sep/19 19:50 Start Date: 17/Sep/19 19:50 Worklog Time Spent: 10m Work Description: arekusuri commented on pull request #2722: GOBBLIN-865: Add feature that enables PK-chunking in partition URL: https://github.com/apache/incubator-gobblin/pull/2722#discussion_r325343123 ## File path: gobblin-salesforce/src/main/java/org/apache/gobblin/salesforce/SalesforceExtractor.java ## @@ -779,7 +880,7 @@ public boolean bulkApiLogin() throws Exception { BatchInfoList batchInfoList = this.bulkConnection.getBatchInfoList(this.bulkJob.getId()); Review comment: >> BatchInfoList batchInfoList = this.bulkConnection.getBatchInfoList(this.bulkJob.getId()); I try not to change the existing code in **getQueryResultIds**. And this line is necessary. **getBatchInfoList** function's return value is **BatchIdAndResultId** which means a list of resultIds with corresponded batch Ids. One bulkJobId could have multiple results even though it is not a pk-chunking. I am not use **getQueryResultIds** for pkchunking. If we want to do further code refactoring, let's do it in another ticket. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 313877) Time Spent: 5h 40m (was: 5.5h) > Add feature that enables PK-chunking in partition > -- > > Key: GOBBLIN-865 > URL: https://issues.apache.org/jira/browse/GOBBLIN-865 > Project: Apache Gobblin > Issue Type: Task >Reporter: Alex Li >Priority: Major > Labels: salesforce > Time Spent: 5h 40m > Remaining Estimate: 0h > > In SFDC(salesforce) connector, we have partitioning mechanisms to split a > giant query to multiple sub queries. There are 3 mechanisms: > * simple partition (equally split by time) > * dynamic pre-partition (generate histogram and split by row numbers) > * user specified partition (set up time range in job file) > However there are tables like Task and Contract are failing time to time to > fetch full data. > We may want to utilize PK-chunking to partition the query. > > The pk-chunking doc from SFDC - > [https://developer.salesforce.com/docs/atlas.en-us.api_asynch.meta/api_asynch/async_api_headers_enable_pk_chunking.htm] -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (GOBBLIN-865) Add feature that enables PK-chunking in partition
[ https://issues.apache.org/jira/browse/GOBBLIN-865?focusedWorklogId=313875=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-313875 ] ASF GitHub Bot logged work on GOBBLIN-865: -- Author: ASF GitHub Bot Created on: 17/Sep/19 19:50 Start Date: 17/Sep/19 19:50 Worklog Time Spent: 10m Work Description: arekusuri commented on pull request #2722: GOBBLIN-865: Add feature that enables PK-chunking in partition URL: https://github.com/apache/incubator-gobblin/pull/2722#discussion_r325354169 ## File path: gobblin-salesforce/src/main/java/org/apache/gobblin/salesforce/SalesforceExtractor.java ## @@ -979,39 +1118,33 @@ private void fetchResultBatchWithRetry(RecordSetList rs) @Override public void closeConnection() throws Exception { if (this.bulkConnection != null -&& !this.bulkConnection.getJobStatus(this.bulkJob.getId()).getState().toString().equals("Closed")) { +&& !this.bulkConnection.getJobStatus(this.getBulkJobId()).getState().toString().equals("Closed")) { log.info("Closing salesforce bulk job connection"); - this.bulkConnection.closeJob(this.bulkJob.getId()); + this.bulkConnection.closeJob(this.getBulkJobId()); } } - public static List constructGetCommand(String restQuery) { + private static List constructGetCommand(String restQuery) { return Arrays.asList(new RestApiCommand().build(Arrays.asList(restQuery), RestApiCommandType.GET)); } /** * Waits for the PK batches to complete. The wait will stop after all batches are complete or on the first failed batch * @param batchInfoList list of batch info - * @param retryInterval the polling interval + * @param waitInterval the polling interval * @return the last {@link BatchInfo} processed * @throws InterruptedException * @throws AsyncApiException */ - private BatchInfo waitForPkBatches(BatchInfoList batchInfoList, int retryInterval) + private BatchInfo waitForPkBatches(BatchInfoList batchInfoList, int waitInterval) throws InterruptedException, AsyncApiException { BatchInfo batchInfo = null; BatchInfo[] batchInfos = batchInfoList.getBatchInfo(); -// Wait for all batches other than the first one. The first one is not processed in PK chunking mode -for (int i = 1; i < batchInfos.length; i++) { - BatchInfo bi = batchInfos[i]; - - // get refreshed job status - bi = this.bulkConnection.getBatchInfo(this.bulkJob.getId(), bi.getId()); - - while ((bi.getState() != BatchStateEnum.Completed) - && (bi.getState() != BatchStateEnum.Failed)) { -Thread.sleep(retryInterval * 1000); +for (BatchInfo bi: batchInfos) { + BatchStateEnum state = bi.getState(); + while (state != BatchStateEnum.Completed && state != BatchStateEnum.Failed && state != BatchStateEnum.NotProcessed) { +Thread.sleep(waitInterval * 1000); bi = this.bulkConnection.getBatchInfo(this.bulkJob.getId(), bi.getId()); Review comment: Good catch! Thanks! I should not use the state variable. My test worked, because I was using break point, the time was enough to let sfdc execute. I did more refactoring for this part. Pushing code advance. Will do test during today. and update you. Thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 313875) Time Spent: 5h 40m (was: 5.5h) > Add feature that enables PK-chunking in partition > -- > > Key: GOBBLIN-865 > URL: https://issues.apache.org/jira/browse/GOBBLIN-865 > Project: Apache Gobblin > Issue Type: Task >Reporter: Alex Li >Priority: Major > Labels: salesforce > Time Spent: 5h 40m > Remaining Estimate: 0h > > In SFDC(salesforce) connector, we have partitioning mechanisms to split a > giant query to multiple sub queries. There are 3 mechanisms: > * simple partition (equally split by time) > * dynamic pre-partition (generate histogram and split by row numbers) > * user specified partition (set up time range in job file) > However there are tables like Task and Contract are failing time to time to > fetch full data. > We may want to utilize PK-chunking to partition the query. > > The pk-chunking doc from SFDC - > [https://developer.salesforce.com/docs/atlas.en-us.api_asynch.meta/api_asynch/async_api_headers_enable_pk_chunking.htm] -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (GOBBLIN-865) Add feature that enables PK-chunking in partition
[ https://issues.apache.org/jira/browse/GOBBLIN-865?focusedWorklogId=313876=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-313876 ] ASF GitHub Bot logged work on GOBBLIN-865: -- Author: ASF GitHub Bot Created on: 17/Sep/19 19:50 Start Date: 17/Sep/19 19:50 Worklog Time Spent: 10m Work Description: arekusuri commented on pull request #2722: GOBBLIN-865: Add feature that enables PK-chunking in partition URL: https://github.com/apache/incubator-gobblin/pull/2722#discussion_r325340740 ## File path: gobblin-salesforce/src/main/java/org/apache/gobblin/salesforce/SalesforceExtractor.java ## @@ -144,28 +142,18 @@ private final boolean pkChunkingSkipCountCheck; private final boolean bulkApiUseQueryAll; + private WorkUnitState workUnitState; + public SalesforceExtractor(WorkUnitState state) { super(state); -this.sfConnector = (SalesforceConnector) this.connector; - -// don't allow pk chunking if max partitions too high or have user specified partitions -if (state.getPropAsBoolean(Partitioner.HAS_USER_SPECIFIED_PARTITIONS, false) Review comment: This part code was only for **PK chunking**. We don't need 2nd level PK-chunking any more. (if we have normal pre-partition, we should not use pk chunking. It burns out request quota) I create separate function - getQueryResultIdsPkChunking for pkchunking. and made it not to depends on class member this.pkChunking, therefore removed pkChunking. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 313876) Time Spent: 5h 40m (was: 5.5h) > Add feature that enables PK-chunking in partition > -- > > Key: GOBBLIN-865 > URL: https://issues.apache.org/jira/browse/GOBBLIN-865 > Project: Apache Gobblin > Issue Type: Task >Reporter: Alex Li >Priority: Major > Labels: salesforce > Time Spent: 5h 40m > Remaining Estimate: 0h > > In SFDC(salesforce) connector, we have partitioning mechanisms to split a > giant query to multiple sub queries. There are 3 mechanisms: > * simple partition (equally split by time) > * dynamic pre-partition (generate histogram and split by row numbers) > * user specified partition (set up time range in job file) > However there are tables like Task and Contract are failing time to time to > fetch full data. > We may want to utilize PK-chunking to partition the query. > > The pk-chunking doc from SFDC - > [https://developer.salesforce.com/docs/atlas.en-us.api_asynch.meta/api_asynch/async_api_headers_enable_pk_chunking.htm] -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (GOBBLIN-865) Add feature that enables PK-chunking in partition
[ https://issues.apache.org/jira/browse/GOBBLIN-865?focusedWorklogId=313878=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-313878 ] ASF GitHub Bot logged work on GOBBLIN-865: -- Author: ASF GitHub Bot Created on: 17/Sep/19 19:50 Start Date: 17/Sep/19 19:50 Worklog Time Spent: 10m Work Description: arekusuri commented on pull request #2722: GOBBLIN-865: Add feature that enables PK-chunking in partition URL: https://github.com/apache/incubator-gobblin/pull/2722#discussion_r325346830 ## File path: gobblin-salesforce/src/main/java/org/apache/gobblin/salesforce/SalesforceExtractor.java ## @@ -979,39 +1118,33 @@ private void fetchResultBatchWithRetry(RecordSetList rs) @Override public void closeConnection() throws Exception { if (this.bulkConnection != null -&& !this.bulkConnection.getJobStatus(this.bulkJob.getId()).getState().toString().equals("Closed")) { +&& !this.bulkConnection.getJobStatus(this.getBulkJobId()).getState().toString().equals("Closed")) { log.info("Closing salesforce bulk job connection"); - this.bulkConnection.closeJob(this.bulkJob.getId()); + this.bulkConnection.closeJob(this.getBulkJobId()); } } - public static List constructGetCommand(String restQuery) { + private static List constructGetCommand(String restQuery) { return Arrays.asList(new RestApiCommand().build(Arrays.asList(restQuery), RestApiCommandType.GET)); } /** * Waits for the PK batches to complete. The wait will stop after all batches are complete or on the first failed batch * @param batchInfoList list of batch info - * @param retryInterval the polling interval + * @param waitInterval the polling interval * @return the last {@link BatchInfo} processed * @throws InterruptedException * @throws AsyncApiException */ - private BatchInfo waitForPkBatches(BatchInfoList batchInfoList, int retryInterval) + private BatchInfo waitForPkBatches(BatchInfoList batchInfoList, int waitInterval) throws InterruptedException, AsyncApiException { BatchInfo batchInfo = null; BatchInfo[] batchInfos = batchInfoList.getBatchInfo(); -// Wait for all batches other than the first one. The first one is not processed in PK chunking mode -for (int i = 1; i < batchInfos.length; i++) { - BatchInfo bi = batchInfos[i]; - - // get refreshed job status - bi = this.bulkConnection.getBatchInfo(this.bulkJob.getId(), bi.getId()); - - while ((bi.getState() != BatchStateEnum.Completed) - && (bi.getState() != BatchStateEnum.Failed)) { -Thread.sleep(retryInterval * 1000); +for (BatchInfo bi: batchInfos) { Review comment: I had a ticket for this - https://jira01.corp.linkedin.com:8443/browse/DSS-1 pkchunking is using this function. Sometimes, the parent may not be the first element in the list. (see screenshot in ticket) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 313878) Time Spent: 5h 40m (was: 5.5h) > Add feature that enables PK-chunking in partition > -- > > Key: GOBBLIN-865 > URL: https://issues.apache.org/jira/browse/GOBBLIN-865 > Project: Apache Gobblin > Issue Type: Task >Reporter: Alex Li >Priority: Major > Labels: salesforce > Time Spent: 5h 40m > Remaining Estimate: 0h > > In SFDC(salesforce) connector, we have partitioning mechanisms to split a > giant query to multiple sub queries. There are 3 mechanisms: > * simple partition (equally split by time) > * dynamic pre-partition (generate histogram and split by row numbers) > * user specified partition (set up time range in job file) > However there are tables like Task and Contract are failing time to time to > fetch full data. > We may want to utilize PK-chunking to partition the query. > > The pk-chunking doc from SFDC - > [https://developer.salesforce.com/docs/atlas.en-us.api_asynch.meta/api_asynch/async_api_headers_enable_pk_chunking.htm] -- This message was sent by Atlassian Jira (v8.3.2#803003)
[GitHub] [incubator-gobblin] arekusuri commented on a change in pull request #2722: GOBBLIN-865: Add feature that enables PK-chunking in partition
arekusuri commented on a change in pull request #2722: GOBBLIN-865: Add feature that enables PK-chunking in partition URL: https://github.com/apache/incubator-gobblin/pull/2722#discussion_r325340740 ## File path: gobblin-salesforce/src/main/java/org/apache/gobblin/salesforce/SalesforceExtractor.java ## @@ -144,28 +142,18 @@ private final boolean pkChunkingSkipCountCheck; private final boolean bulkApiUseQueryAll; + private WorkUnitState workUnitState; + public SalesforceExtractor(WorkUnitState state) { super(state); -this.sfConnector = (SalesforceConnector) this.connector; - -// don't allow pk chunking if max partitions too high or have user specified partitions -if (state.getPropAsBoolean(Partitioner.HAS_USER_SPECIFIED_PARTITIONS, false) Review comment: This part code was only for **PK chunking**. We don't need 2nd level PK-chunking any more. (if we have normal pre-partition, we should not use pk chunking. It burns out request quota) I create separate function - getQueryResultIdsPkChunking for pkchunking. and made it not to depends on class member this.pkChunking, therefore removed pkChunking. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Work logged] (GOBBLIN-865) Add feature that enables PK-chunking in partition
[ https://issues.apache.org/jira/browse/GOBBLIN-865?focusedWorklogId=313924=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-313924 ] ASF GitHub Bot logged work on GOBBLIN-865: -- Author: ASF GitHub Bot Created on: 17/Sep/19 21:15 Start Date: 17/Sep/19 21:15 Worklog Time Spent: 10m Work Description: codecov-io commented on issue #2722: GOBBLIN-865: Add feature that enables PK-chunking in partition URL: https://github.com/apache/incubator-gobblin/pull/2722#issuecomment-531069100 # [Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2722?src=pr=h1) Report > Merging [#2722](https://codecov.io/gh/apache/incubator-gobblin/pull/2722?src=pr=desc) into [master](https://codecov.io/gh/apache/incubator-gobblin/commit/9bf9a882427e98e7f4ef089c4ca1bde42f4b36a3?src=pr=desc) will **decrease** coverage by `0.02%`. > The diff coverage is `1.66%`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/graphs/tree.svg?width=650=4MgURJ0bGc=150=pr)](https://codecov.io/gh/apache/incubator-gobblin/pull/2722?src=pr=tree) ```diff @@ Coverage Diff @@ ## master#2722 +/- ## - Coverage 45.04% 45.01% -0.03% - Complexity 8739 8770 +31 Files 1880 1886 +6 Lines 7020570591 +386 Branches 7707 7739 +32 + Hits 3162331779 +156 - Misses3565135864 +213 - Partials 2931 2948 +17 ``` | [Impacted Files](https://codecov.io/gh/apache/incubator-gobblin/pull/2722?src=pr=tree) | Coverage Δ | Complexity Δ | | |---|---|---|---| | [...obblin/salesforce/SalesforceConfigurationKeys.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/diff?src=pr=tree#diff-Z29iYmxpbi1zYWxlc2ZvcmNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NhbGVzZm9yY2UvU2FsZXNmb3JjZUNvbmZpZ3VyYXRpb25LZXlzLmphdmE=) | `0% <ø> (ø)` | `0 <0> (ø)` | :arrow_down: | | [...apache/gobblin/salesforce/SalesforceExtractor.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/diff?src=pr=tree#diff-Z29iYmxpbi1zYWxlc2ZvcmNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NhbGVzZm9yY2UvU2FsZXNmb3JjZUV4dHJhY3Rvci5qYXZh) | `0% <0%> (ø)` | `0 <0> (ø)` | :arrow_down: | | [...rg/apache/gobblin/salesforce/SalesforceSource.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/diff?src=pr=tree#diff-Z29iYmxpbi1zYWxlc2ZvcmNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NhbGVzZm9yY2UvU2FsZXNmb3JjZVNvdXJjZS5qYXZh) | `19.74% <5.66%> (-3.02%)` | `12 <1> (+1)` | | | [...obblin/service/monitoring/FlowStatusGenerator.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/diff?src=pr=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NlcnZpY2UvbW9uaXRvcmluZy9GbG93U3RhdHVzR2VuZXJhdG9yLmphdmE=) | `82.14% <0%> (-7.15%)` | `11% <0%> (-1%)` | | | [...apache/gobblin/runtime/local/LocalJobLauncher.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/diff?src=pr=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvbG9jYWwvTG9jYWxKb2JMYXVuY2hlci5qYXZh) | `61.81% <0%> (-2.34%)` | `5% <0%> (ø)` | | | [...ache/gobblin/couchbase/writer/CouchbaseWriter.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/diff?src=pr=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4tY291Y2hiYXNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NvdWNoYmFzZS93cml0ZXIvQ291Y2hiYXNlV3JpdGVyLmphdmE=) | `64.39% <0%> (-1.89%)` | `15% <0%> (+4%)` | | | [.../apache/gobblin/cluster/GobblinClusterManager.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/diff?src=pr=tree#diff-Z29iYmxpbi1jbHVzdGVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NsdXN0ZXIvR29iYmxpbkNsdXN0ZXJNYW5hZ2VyLmphdmE=) | `53.91% <0%> (-0.51%)` | `27% <0%> (ø)` | | | [...che/gobblin/hive/metastore/HiveMetaStoreUtils.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/diff?src=pr=tree#diff-Z29iYmxpbi1oaXZlLXJlZ2lzdHJhdGlvbi9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvZ29iYmxpbi9oaXZlL21ldGFzdG9yZS9IaXZlTWV0YVN0b3JlVXRpbHMuamF2YQ==) | `31.69% <0%> (-0.15%)` | `12% <0%> (ø)` | | | [...e/modules/flowgraph/datanodes/fs/AdlsDataNode.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/diff?src=pr=tree#diff-Z29iYmxpbi1zZXJ2aWNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NlcnZpY2UvbW9kdWxlcy9mbG93Z3JhcGgvZGF0YW5vZGVzL2ZzL0FkbHNEYXRhTm9kZS5qYXZh) | `50% <0%> (ø)` | `2% <0%> (ø)` | :arrow_down: | |
[GitHub] [incubator-gobblin] codecov-io edited a comment on issue #2722: GOBBLIN-865: Add feature that enables PK-chunking in partition
codecov-io edited a comment on issue #2722: GOBBLIN-865: Add feature that enables PK-chunking in partition URL: https://github.com/apache/incubator-gobblin/pull/2722#issuecomment-531069100 # [Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2722?src=pr=h1) Report > Merging [#2722](https://codecov.io/gh/apache/incubator-gobblin/pull/2722?src=pr=desc) into [master](https://codecov.io/gh/apache/incubator-gobblin/commit/9bf9a882427e98e7f4ef089c4ca1bde42f4b36a3?src=pr=desc) will **decrease** coverage by `0.02%`. > The diff coverage is `1.66%`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/graphs/tree.svg?width=650=4MgURJ0bGc=150=pr)](https://codecov.io/gh/apache/incubator-gobblin/pull/2722?src=pr=tree) ```diff @@ Coverage Diff @@ ## master#2722 +/- ## - Coverage 45.04% 45.01% -0.03% - Complexity 8739 8770 +31 Files 1880 1886 +6 Lines 7020570591 +386 Branches 7707 7739 +32 + Hits 3162331779 +156 - Misses3565135864 +213 - Partials 2931 2948 +17 ``` | [Impacted Files](https://codecov.io/gh/apache/incubator-gobblin/pull/2722?src=pr=tree) | Coverage Δ | Complexity Δ | | |---|---|---|---| | [...obblin/salesforce/SalesforceConfigurationKeys.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/diff?src=pr=tree#diff-Z29iYmxpbi1zYWxlc2ZvcmNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NhbGVzZm9yY2UvU2FsZXNmb3JjZUNvbmZpZ3VyYXRpb25LZXlzLmphdmE=) | `0% <ø> (ø)` | `0 <0> (ø)` | :arrow_down: | | [...apache/gobblin/salesforce/SalesforceExtractor.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/diff?src=pr=tree#diff-Z29iYmxpbi1zYWxlc2ZvcmNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NhbGVzZm9yY2UvU2FsZXNmb3JjZUV4dHJhY3Rvci5qYXZh) | `0% <0%> (ø)` | `0 <0> (ø)` | :arrow_down: | | [...rg/apache/gobblin/salesforce/SalesforceSource.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/diff?src=pr=tree#diff-Z29iYmxpbi1zYWxlc2ZvcmNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NhbGVzZm9yY2UvU2FsZXNmb3JjZVNvdXJjZS5qYXZh) | `19.74% <5.66%> (-3.02%)` | `12 <1> (+1)` | | | [...obblin/service/monitoring/FlowStatusGenerator.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/diff?src=pr=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NlcnZpY2UvbW9uaXRvcmluZy9GbG93U3RhdHVzR2VuZXJhdG9yLmphdmE=) | `82.14% <0%> (-7.15%)` | `11% <0%> (-1%)` | | | [...apache/gobblin/runtime/local/LocalJobLauncher.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/diff?src=pr=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvbG9jYWwvTG9jYWxKb2JMYXVuY2hlci5qYXZh) | `61.81% <0%> (-2.34%)` | `5% <0%> (ø)` | | | [...ache/gobblin/couchbase/writer/CouchbaseWriter.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/diff?src=pr=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4tY291Y2hiYXNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NvdWNoYmFzZS93cml0ZXIvQ291Y2hiYXNlV3JpdGVyLmphdmE=) | `64.39% <0%> (-1.89%)` | `15% <0%> (+4%)` | | | [.../apache/gobblin/cluster/GobblinClusterManager.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/diff?src=pr=tree#diff-Z29iYmxpbi1jbHVzdGVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NsdXN0ZXIvR29iYmxpbkNsdXN0ZXJNYW5hZ2VyLmphdmE=) | `53.91% <0%> (-0.51%)` | `27% <0%> (ø)` | | | [...che/gobblin/hive/metastore/HiveMetaStoreUtils.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/diff?src=pr=tree#diff-Z29iYmxpbi1oaXZlLXJlZ2lzdHJhdGlvbi9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvZ29iYmxpbi9oaXZlL21ldGFzdG9yZS9IaXZlTWV0YVN0b3JlVXRpbHMuamF2YQ==) | `31.69% <0%> (-0.15%)` | `12% <0%> (ø)` | | | [...e/modules/flowgraph/datanodes/fs/AdlsDataNode.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/diff?src=pr=tree#diff-Z29iYmxpbi1zZXJ2aWNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NlcnZpY2UvbW9kdWxlcy9mbG93Z3JhcGgvZGF0YW5vZGVzL2ZzL0FkbHNEYXRhTm9kZS5qYXZh) | `50% <0%> (ø)` | `2% <0%> (ø)` | :arrow_down: | | [...ka/workunit/packer/KafkaBiLevelWorkUnitPacker.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/diff?src=pr=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4ta2Fma2EtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NvdXJjZS9leHRyYWN0b3IvZXh0cmFjdC9rYWZrYS93b3JrdW5pdC9wYWNrZXIvS2Fma2FCaUxldmVsV29ya1VuaXRQYWNrZXIuamF2YQ==) | `0% <0%> (ø)` | `0% <0%> (ø)` | :arrow_down: | | ... and [33 more](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/diff?src=pr=tree-more) | | -- [Continue to review full report at
[jira] [Work logged] (GOBBLIN-865) Add feature that enables PK-chunking in partition
[ https://issues.apache.org/jira/browse/GOBBLIN-865?focusedWorklogId=313923=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-313923 ] ASF GitHub Bot logged work on GOBBLIN-865: -- Author: ASF GitHub Bot Created on: 17/Sep/19 21:11 Start Date: 17/Sep/19 21:11 Worklog Time Spent: 10m Work Description: codecov-io commented on issue #2722: GOBBLIN-865: Add feature that enables PK-chunking in partition URL: https://github.com/apache/incubator-gobblin/pull/2722#issuecomment-531069100 # [Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2722?src=pr=h1) Report > Merging [#2722](https://codecov.io/gh/apache/incubator-gobblin/pull/2722?src=pr=desc) into [master](https://codecov.io/gh/apache/incubator-gobblin/commit/9bf9a882427e98e7f4ef089c4ca1bde42f4b36a3?src=pr=desc) will **decrease** coverage by `0.02%`. > The diff coverage is `1.66%`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/graphs/tree.svg?width=650=4MgURJ0bGc=150=pr)](https://codecov.io/gh/apache/incubator-gobblin/pull/2722?src=pr=tree) ```diff @@ Coverage Diff @@ ## master#2722 +/- ## - Coverage 45.04% 45.01% -0.03% - Complexity 8739 8770 +31 Files 1880 1886 +6 Lines 7020570591 +386 Branches 7707 7739 +32 + Hits 3162331779 +156 - Misses3565135864 +213 - Partials 2931 2948 +17 ``` | [Impacted Files](https://codecov.io/gh/apache/incubator-gobblin/pull/2722?src=pr=tree) | Coverage Δ | Complexity Δ | | |---|---|---|---| | [...obblin/salesforce/SalesforceConfigurationKeys.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/diff?src=pr=tree#diff-Z29iYmxpbi1zYWxlc2ZvcmNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NhbGVzZm9yY2UvU2FsZXNmb3JjZUNvbmZpZ3VyYXRpb25LZXlzLmphdmE=) | `0% <ø> (ø)` | `0 <0> (ø)` | :arrow_down: | | [...apache/gobblin/salesforce/SalesforceExtractor.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/diff?src=pr=tree#diff-Z29iYmxpbi1zYWxlc2ZvcmNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NhbGVzZm9yY2UvU2FsZXNmb3JjZUV4dHJhY3Rvci5qYXZh) | `0% <0%> (ø)` | `0 <0> (ø)` | :arrow_down: | | [...rg/apache/gobblin/salesforce/SalesforceSource.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/diff?src=pr=tree#diff-Z29iYmxpbi1zYWxlc2ZvcmNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NhbGVzZm9yY2UvU2FsZXNmb3JjZVNvdXJjZS5qYXZh) | `19.74% <5.66%> (-3.02%)` | `12 <1> (+1)` | | | [...obblin/service/monitoring/FlowStatusGenerator.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/diff?src=pr=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NlcnZpY2UvbW9uaXRvcmluZy9GbG93U3RhdHVzR2VuZXJhdG9yLmphdmE=) | `82.14% <0%> (-7.15%)` | `11% <0%> (-1%)` | | | [...apache/gobblin/runtime/local/LocalJobLauncher.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/diff?src=pr=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvbG9jYWwvTG9jYWxKb2JMYXVuY2hlci5qYXZh) | `61.81% <0%> (-2.34%)` | `5% <0%> (ø)` | | | [...ache/gobblin/couchbase/writer/CouchbaseWriter.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/diff?src=pr=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4tY291Y2hiYXNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NvdWNoYmFzZS93cml0ZXIvQ291Y2hiYXNlV3JpdGVyLmphdmE=) | `64.39% <0%> (-1.89%)` | `15% <0%> (+4%)` | | | [.../apache/gobblin/cluster/GobblinClusterManager.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/diff?src=pr=tree#diff-Z29iYmxpbi1jbHVzdGVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NsdXN0ZXIvR29iYmxpbkNsdXN0ZXJNYW5hZ2VyLmphdmE=) | `53.91% <0%> (-0.51%)` | `27% <0%> (ø)` | | | [...che/gobblin/hive/metastore/HiveMetaStoreUtils.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/diff?src=pr=tree#diff-Z29iYmxpbi1oaXZlLXJlZ2lzdHJhdGlvbi9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvZ29iYmxpbi9oaXZlL21ldGFzdG9yZS9IaXZlTWV0YVN0b3JlVXRpbHMuamF2YQ==) | `31.69% <0%> (-0.15%)` | `12% <0%> (ø)` | | | [...e/modules/flowgraph/datanodes/fs/AdlsDataNode.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/diff?src=pr=tree#diff-Z29iYmxpbi1zZXJ2aWNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NlcnZpY2UvbW9kdWxlcy9mbG93Z3JhcGgvZGF0YW5vZGVzL2ZzL0FkbHNEYXRhTm9kZS5qYXZh) | `50% <0%> (ø)` | `2% <0%> (ø)` | :arrow_down: | |
[GitHub] [incubator-gobblin] codecov-io edited a comment on issue #2722: GOBBLIN-865: Add feature that enables PK-chunking in partition
codecov-io edited a comment on issue #2722: GOBBLIN-865: Add feature that enables PK-chunking in partition URL: https://github.com/apache/incubator-gobblin/pull/2722#issuecomment-531069100 # [Codecov](https://codecov.io/gh/apache/incubator-gobblin/pull/2722?src=pr=h1) Report > Merging [#2722](https://codecov.io/gh/apache/incubator-gobblin/pull/2722?src=pr=desc) into [master](https://codecov.io/gh/apache/incubator-gobblin/commit/9bf9a882427e98e7f4ef089c4ca1bde42f4b36a3?src=pr=desc) will **decrease** coverage by `0.02%`. > The diff coverage is `1.66%`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/graphs/tree.svg?width=650=4MgURJ0bGc=150=pr)](https://codecov.io/gh/apache/incubator-gobblin/pull/2722?src=pr=tree) ```diff @@ Coverage Diff @@ ## master#2722 +/- ## - Coverage 45.04% 45.01% -0.03% - Complexity 8739 8770 +31 Files 1880 1886 +6 Lines 7020570591 +386 Branches 7707 7739 +32 + Hits 3162331779 +156 - Misses3565135864 +213 - Partials 2931 2948 +17 ``` | [Impacted Files](https://codecov.io/gh/apache/incubator-gobblin/pull/2722?src=pr=tree) | Coverage Δ | Complexity Δ | | |---|---|---|---| | [...obblin/salesforce/SalesforceConfigurationKeys.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/diff?src=pr=tree#diff-Z29iYmxpbi1zYWxlc2ZvcmNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NhbGVzZm9yY2UvU2FsZXNmb3JjZUNvbmZpZ3VyYXRpb25LZXlzLmphdmE=) | `0% <ø> (ø)` | `0 <0> (ø)` | :arrow_down: | | [...apache/gobblin/salesforce/SalesforceExtractor.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/diff?src=pr=tree#diff-Z29iYmxpbi1zYWxlc2ZvcmNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NhbGVzZm9yY2UvU2FsZXNmb3JjZUV4dHJhY3Rvci5qYXZh) | `0% <0%> (ø)` | `0 <0> (ø)` | :arrow_down: | | [...rg/apache/gobblin/salesforce/SalesforceSource.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/diff?src=pr=tree#diff-Z29iYmxpbi1zYWxlc2ZvcmNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NhbGVzZm9yY2UvU2FsZXNmb3JjZVNvdXJjZS5qYXZh) | `19.74% <5.66%> (-3.02%)` | `12 <1> (+1)` | | | [...obblin/service/monitoring/FlowStatusGenerator.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/diff?src=pr=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NlcnZpY2UvbW9uaXRvcmluZy9GbG93U3RhdHVzR2VuZXJhdG9yLmphdmE=) | `82.14% <0%> (-7.15%)` | `11% <0%> (-1%)` | | | [...apache/gobblin/runtime/local/LocalJobLauncher.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/diff?src=pr=tree#diff-Z29iYmxpbi1ydW50aW1lL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3J1bnRpbWUvbG9jYWwvTG9jYWxKb2JMYXVuY2hlci5qYXZh) | `61.81% <0%> (-2.34%)` | `5% <0%> (ø)` | | | [...ache/gobblin/couchbase/writer/CouchbaseWriter.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/diff?src=pr=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4tY291Y2hiYXNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NvdWNoYmFzZS93cml0ZXIvQ291Y2hiYXNlV3JpdGVyLmphdmE=) | `64.39% <0%> (-1.89%)` | `15% <0%> (+4%)` | | | [.../apache/gobblin/cluster/GobblinClusterManager.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/diff?src=pr=tree#diff-Z29iYmxpbi1jbHVzdGVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NsdXN0ZXIvR29iYmxpbkNsdXN0ZXJNYW5hZ2VyLmphdmE=) | `53.91% <0%> (-0.51%)` | `27% <0%> (ø)` | | | [...che/gobblin/hive/metastore/HiveMetaStoreUtils.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/diff?src=pr=tree#diff-Z29iYmxpbi1oaXZlLXJlZ2lzdHJhdGlvbi9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvZ29iYmxpbi9oaXZlL21ldGFzdG9yZS9IaXZlTWV0YVN0b3JlVXRpbHMuamF2YQ==) | `31.69% <0%> (-0.15%)` | `12% <0%> (ø)` | | | [...e/modules/flowgraph/datanodes/fs/AdlsDataNode.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/diff?src=pr=tree#diff-Z29iYmxpbi1zZXJ2aWNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NlcnZpY2UvbW9kdWxlcy9mbG93Z3JhcGgvZGF0YW5vZGVzL2ZzL0FkbHNEYXRhTm9kZS5qYXZh) | `50% <0%> (ø)` | `2% <0%> (ø)` | :arrow_down: | | [...ka/workunit/packer/KafkaBiLevelWorkUnitPacker.java](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/diff?src=pr=tree#diff-Z29iYmxpbi1tb2R1bGVzL2dvYmJsaW4ta2Fma2EtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NvdXJjZS9leHRyYWN0b3IvZXh0cmFjdC9rYWZrYS93b3JrdW5pdC9wYWNrZXIvS2Fma2FCaUxldmVsV29ya1VuaXRQYWNrZXIuamF2YQ==) | `0% <0%> (ø)` | `0% <0%> (ø)` | :arrow_down: | | ... and [33 more](https://codecov.io/gh/apache/incubator-gobblin/pull/2722/diff?src=pr=tree-more) | | -- [Continue to review full report at
[jira] [Created] (GOBBLIN-882) Fix Gobblin as a Service start scripts
William Lo created GOBBLIN-882: -- Summary: Fix Gobblin as a Service start scripts Key: GOBBLIN-882 URL: https://issues.apache.org/jira/browse/GOBBLIN-882 Project: Apache Gobblin Issue Type: Improvement Components: gobblin-service Affects Versions: 0.15.0 Reporter: William Lo Assignee: Abhishek Tiwari Fix For: 0.15.0 -- This message was sent by Atlassian Jira (v8.3.4#803005)