[jira] [Commented] (NIFI-5788) Introduce batch size limit in PutDatabaseRecord processor
[ https://issues.apache.org/jira/browse/NIFI-5788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16688248#comment-16688248 ] ASF GitHub Bot commented on NIFI-5788: -- Github user mattyb149 commented on the issue: https://github.com/apache/nifi/pull/3128 +1 LGTM, tested with various batch sizes and ran unit tests. Thanks for this improvment! Merged to master > Introduce batch size limit in PutDatabaseRecord processor > - > > Key: NIFI-5788 > URL: https://issues.apache.org/jira/browse/NIFI-5788 > Project: Apache NiFi > Issue Type: Improvement > Components: Core Framework > Environment: Teradata DB >Reporter: Vadim >Priority: Major > Labels: pull-request-available > Fix For: 1.9.0 > > > Certain JDBC drivers do not support unlimited batch size in INSERT/UPDATE > prepared SQL statements. Specifically, Teradata JDBC driver > ([https://downloads.teradata.com/download/connectivity/jdbc-driver)] would > fail SQL statement when the batch overflows the internal limits. > Dividing data into smaller chunks before the PutDatabaseRecord is applied can > work around the issue in certain scenarios, but generally, this solution is > not perfect because the SQL statements would be executed in different > transaction contexts and data integrity would not be preserved. > The solution suggests the following: > * introduce a new optional parameter in *PutDatabaseRecord* processor, > *max_batch_size* which defines the maximum batch size in INSERT/UPDATE > statement; the default value zero (INFINITY) preserves the old behavior > * divide the input into batches of the specified size and invoke > PreparedStatement.executeBatch() for each batch > Pull request: [https://github.com/apache/nifi/pull/3128] > > [EDIT] Changed batch_size to max_batch_size. The default value would be zero > (INFINITY) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (NIFI-5788) Introduce batch size limit in PutDatabaseRecord processor
[ https://issues.apache.org/jira/browse/NIFI-5788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16688240#comment-16688240 ] ASF subversion and git services commented on NIFI-5788: --- Commit d319a3ef2f14317f29a1be5a189bc34f8b3fdbd6 in nifi's branch refs/heads/master from vadimar [ https://git-wip-us.apache.org/repos/asf?p=nifi.git;h=d319a3e ] NIFI-5788: Introduce batch size limit in PutDatabaseRecord processor NIFI-5788: Introduce batch size limit in PutDatabaseRecord processor Renamed 'batch size' to 'Maximum Batch Size'. Changed default value of max_batch_size to zero (INFINITE) Fixed parameter validation. Added unit tests Signed-off-by: Matthew Burgess This closes #3128 > Introduce batch size limit in PutDatabaseRecord processor > - > > Key: NIFI-5788 > URL: https://issues.apache.org/jira/browse/NIFI-5788 > Project: Apache NiFi > Issue Type: Improvement > Components: Core Framework > Environment: Teradata DB >Reporter: Vadim >Priority: Major > Labels: pull-request-available > Fix For: 1.9.0 > > > Certain JDBC drivers do not support unlimited batch size in INSERT/UPDATE > prepared SQL statements. Specifically, Teradata JDBC driver > ([https://downloads.teradata.com/download/connectivity/jdbc-driver)] would > fail SQL statement when the batch overflows the internal limits. > Dividing data into smaller chunks before the PutDatabaseRecord is applied can > work around the issue in certain scenarios, but generally, this solution is > not perfect because the SQL statements would be executed in different > transaction contexts and data integrity would not be preserved. > The solution suggests the following: > * introduce a new optional parameter in *PutDatabaseRecord* processor, > *max_batch_size* which defines the maximum batch size in INSERT/UPDATE > statement; the default value zero (INFINITY) preserves the old behavior > * divide the input into batches of the specified size and invoke > PreparedStatement.executeBatch() for each batch > Pull request: [https://github.com/apache/nifi/pull/3128] > > [EDIT] Changed batch_size to max_batch_size. The default value would be zero > (INFINITY) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (NIFI-5788) Introduce batch size limit in PutDatabaseRecord processor
[ https://issues.apache.org/jira/browse/NIFI-5788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16688239#comment-16688239 ] ASF subversion and git services commented on NIFI-5788: --- Commit d319a3ef2f14317f29a1be5a189bc34f8b3fdbd6 in nifi's branch refs/heads/master from vadimar [ https://git-wip-us.apache.org/repos/asf?p=nifi.git;h=d319a3e ] NIFI-5788: Introduce batch size limit in PutDatabaseRecord processor NIFI-5788: Introduce batch size limit in PutDatabaseRecord processor Renamed 'batch size' to 'Maximum Batch Size'. Changed default value of max_batch_size to zero (INFINITE) Fixed parameter validation. Added unit tests Signed-off-by: Matthew Burgess This closes #3128 > Introduce batch size limit in PutDatabaseRecord processor > - > > Key: NIFI-5788 > URL: https://issues.apache.org/jira/browse/NIFI-5788 > Project: Apache NiFi > Issue Type: Improvement > Components: Core Framework > Environment: Teradata DB >Reporter: Vadim >Priority: Major > Labels: pull-request-available > Fix For: 1.9.0 > > > Certain JDBC drivers do not support unlimited batch size in INSERT/UPDATE > prepared SQL statements. Specifically, Teradata JDBC driver > ([https://downloads.teradata.com/download/connectivity/jdbc-driver)] would > fail SQL statement when the batch overflows the internal limits. > Dividing data into smaller chunks before the PutDatabaseRecord is applied can > work around the issue in certain scenarios, but generally, this solution is > not perfect because the SQL statements would be executed in different > transaction contexts and data integrity would not be preserved. > The solution suggests the following: > * introduce a new optional parameter in *PutDatabaseRecord* processor, > *max_batch_size* which defines the maximum batch size in INSERT/UPDATE > statement; the default value zero (INFINITY) preserves the old behavior > * divide the input into batches of the specified size and invoke > PreparedStatement.executeBatch() for each batch > Pull request: [https://github.com/apache/nifi/pull/3128] > > [EDIT] Changed batch_size to max_batch_size. The default value would be zero > (INFINITY) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (NIFI-5788) Introduce batch size limit in PutDatabaseRecord processor
[ https://issues.apache.org/jira/browse/NIFI-5788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16688243#comment-16688243 ] ASF GitHub Bot commented on NIFI-5788: -- Github user asfgit closed the pull request at: https://github.com/apache/nifi/pull/3128 > Introduce batch size limit in PutDatabaseRecord processor > - > > Key: NIFI-5788 > URL: https://issues.apache.org/jira/browse/NIFI-5788 > Project: Apache NiFi > Issue Type: Improvement > Components: Core Framework > Environment: Teradata DB >Reporter: Vadim >Priority: Major > Labels: pull-request-available > Fix For: 1.9.0 > > > Certain JDBC drivers do not support unlimited batch size in INSERT/UPDATE > prepared SQL statements. Specifically, Teradata JDBC driver > ([https://downloads.teradata.com/download/connectivity/jdbc-driver)] would > fail SQL statement when the batch overflows the internal limits. > Dividing data into smaller chunks before the PutDatabaseRecord is applied can > work around the issue in certain scenarios, but generally, this solution is > not perfect because the SQL statements would be executed in different > transaction contexts and data integrity would not be preserved. > The solution suggests the following: > * introduce a new optional parameter in *PutDatabaseRecord* processor, > *max_batch_size* which defines the maximum batch size in INSERT/UPDATE > statement; the default value zero (INFINITY) preserves the old behavior > * divide the input into batches of the specified size and invoke > PreparedStatement.executeBatch() for each batch > Pull request: [https://github.com/apache/nifi/pull/3128] > > [EDIT] Changed batch_size to max_batch_size. The default value would be zero > (INFINITY) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (NIFI-5788) Introduce batch size limit in PutDatabaseRecord processor
[ https://issues.apache.org/jira/browse/NIFI-5788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16679466#comment-16679466 ] ASF GitHub Bot commented on NIFI-5788: -- Github user vadimar commented on the issue: https://github.com/apache/nifi/pull/3128 Hi, Can you please review the latest commits? I committed the changes that address all the issues raised by reviewers. Thanks > Introduce batch size limit in PutDatabaseRecord processor > - > > Key: NIFI-5788 > URL: https://issues.apache.org/jira/browse/NIFI-5788 > Project: Apache NiFi > Issue Type: Improvement > Components: Core Framework > Environment: Teradata DB >Reporter: Vadim >Priority: Major > Labels: pull-request-available > > Certain JDBC drivers do not support unlimited batch size in INSERT/UPDATE > prepared SQL statements. Specifically, Teradata JDBC driver > ([https://downloads.teradata.com/download/connectivity/jdbc-driver)] would > fail SQL statement when the batch overflows the internal limits. > Dividing data into smaller chunks before the PutDatabaseRecord is applied can > work around the issue in certain scenarios, but generally, this solution is > not perfect because the SQL statements would be executed in different > transaction contexts and data integrity would not be preserved. > The solution suggests the following: > * introduce a new optional parameter in *PutDatabaseRecord* processor, > *max_batch_size* which defines the maximum batch size in INSERT/UPDATE > statement; the default value zero (INFINITY) preserves the old behavior > * divide the input into batches of the specified size and invoke > PreparedStatement.executeBatch() for each batch > Pull request: [https://github.com/apache/nifi/pull/3128] > > [EDIT] Changed batch_size to max_batch_size. The default value would be zero > (INFINITY) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (NIFI-5788) Introduce batch size limit in PutDatabaseRecord processor
[ https://issues.apache.org/jira/browse/NIFI-5788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16676856#comment-16676856 ] ASF GitHub Bot commented on NIFI-5788: -- Github user patricker commented on a diff in the pull request: https://github.com/apache/nifi/pull/3128#discussion_r231153599 --- Diff: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/PutDatabaseRecord.java --- @@ -669,11 +685,20 @@ private void executeDML(ProcessContext context, ProcessSession session, FlowFile } } ps.addBatch(); +if (++currentBatchSize == batchSize) { --- End diff -- True, I missed that override before, but I see it now. So definitely less valuable, the only thing it would provide would be troubleshooting guidance, "your bad data is roughly in this part of the file". Probably not worth it. Thanks! > Introduce batch size limit in PutDatabaseRecord processor > - > > Key: NIFI-5788 > URL: https://issues.apache.org/jira/browse/NIFI-5788 > Project: Apache NiFi > Issue Type: Improvement > Components: Core Framework > Environment: Teradata DB >Reporter: Vadim >Priority: Major > Labels: pull-request-available > > Certain JDBC drivers do not support unlimited batch size in INSERT/UPDATE > prepared SQL statements. Specifically, Teradata JDBC driver > ([https://downloads.teradata.com/download/connectivity/jdbc-driver)] would > fail SQL statement when the batch overflows the internal limits. > Dividing data into smaller chunks before the PutDatabaseRecord is applied can > work around the issue in certain scenarios, but generally, this solution is > not perfect because the SQL statements would be executed in different > transaction contexts and data integrity would not be preserved. > The solution suggests the following: > * introduce a new optional parameter in *PutDatabaseRecord* processor, > *max_batch_size* which defines the maximum batch size in INSERT/UPDATE > statement; the default value zero (INFINITY) preserves the old behavior > * divide the input into batches of the specified size and invoke > PreparedStatement.executeBatch() for each batch > Pull request: [https://github.com/apache/nifi/pull/3128] > > [EDIT] Changed batch_size to max_batch_size. The default value would be zero > (INFINITY) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (NIFI-5788) Introduce batch size limit in PutDatabaseRecord processor
[ https://issues.apache.org/jira/browse/NIFI-5788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16676660#comment-16676660 ] ASF GitHub Bot commented on NIFI-5788: -- Github user vadimar commented on a diff in the pull request: https://github.com/apache/nifi/pull/3128#discussion_r231089816 --- Diff: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/PutDatabaseRecord.java --- @@ -265,6 +265,17 @@ .expressionLanguageSupported(ExpressionLanguageScope.VARIABLE_REGISTRY) .build(); +static final PropertyDescriptor BATCH_SIZE = new PropertyDescriptor.Builder() --- End diff -- Oh. I see it now. The display label is "Bulk Size". I'll fix it to be "Maximum Batch Size". Thanks > Introduce batch size limit in PutDatabaseRecord processor > - > > Key: NIFI-5788 > URL: https://issues.apache.org/jira/browse/NIFI-5788 > Project: Apache NiFi > Issue Type: Improvement > Components: Core Framework > Environment: Teradata DB >Reporter: Vadim >Priority: Major > Labels: pull-request-available > > Certain JDBC drivers do not support unlimited batch size in INSERT/UPDATE > prepared SQL statements. Specifically, Teradata JDBC driver > ([https://downloads.teradata.com/download/connectivity/jdbc-driver)] would > fail SQL statement when the batch overflows the internal limits. > Dividing data into smaller chunks before the PutDatabaseRecord is applied can > work around the issue in certain scenarios, but generally, this solution is > not perfect because the SQL statements would be executed in different > transaction contexts and data integrity would not be preserved. > The solution suggests the following: > * introduce a new optional parameter in *PutDatabaseRecord* processor, > *batch_size* which defines the maximum size of the bulk in INSERT/UPDATE > statement; its default value is -1 (INFINITY) preserves the old behavior > * divide the input into batches of the specified size and invoke > PreparedStatement.executeBatch() for each batch > Pull request: [https://github.com/apache/nifi/pull/3128] > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (NIFI-5788) Introduce batch size limit in PutDatabaseRecord processor
[ https://issues.apache.org/jira/browse/NIFI-5788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16676649#comment-16676649 ] ASF GitHub Bot commented on NIFI-5788: -- Github user vadimar commented on a diff in the pull request: https://github.com/apache/nifi/pull/3128#discussion_r231088684 --- Diff: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/PutDatabaseRecord.java --- @@ -669,11 +685,20 @@ private void executeDML(ProcessContext context, ProcessSession session, FlowFile } } ps.addBatch(); +if (++currentBatchSize == batchSize) { --- End diff -- I'm not sure this would be benefitial. PutDatabaseRecord works without autoCommit. It's all or nothing. > Introduce batch size limit in PutDatabaseRecord processor > - > > Key: NIFI-5788 > URL: https://issues.apache.org/jira/browse/NIFI-5788 > Project: Apache NiFi > Issue Type: Improvement > Components: Core Framework > Environment: Teradata DB >Reporter: Vadim >Priority: Major > Labels: pull-request-available > > Certain JDBC drivers do not support unlimited batch size in INSERT/UPDATE > prepared SQL statements. Specifically, Teradata JDBC driver > ([https://downloads.teradata.com/download/connectivity/jdbc-driver)] would > fail SQL statement when the batch overflows the internal limits. > Dividing data into smaller chunks before the PutDatabaseRecord is applied can > work around the issue in certain scenarios, but generally, this solution is > not perfect because the SQL statements would be executed in different > transaction contexts and data integrity would not be preserved. > The solution suggests the following: > * introduce a new optional parameter in *PutDatabaseRecord* processor, > *batch_size* which defines the maximum size of the bulk in INSERT/UPDATE > statement; its default value is -1 (INFINITY) preserves the old behavior > * divide the input into batches of the specified size and invoke > PreparedStatement.executeBatch() for each batch > Pull request: [https://github.com/apache/nifi/pull/3128] > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (NIFI-5788) Introduce batch size limit in PutDatabaseRecord processor
[ https://issues.apache.org/jira/browse/NIFI-5788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16676639#comment-16676639 ] ASF GitHub Bot commented on NIFI-5788: -- Github user vadimar commented on a diff in the pull request: https://github.com/apache/nifi/pull/3128#discussion_r231087439 --- Diff: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/PutDatabaseRecord.java --- @@ -265,6 +265,17 @@ .expressionLanguageSupported(ExpressionLanguageScope.VARIABLE_REGISTRY) .build(); +static final PropertyDescriptor BATCH_SIZE = new PropertyDescriptor.Builder() +.name("put-db-record-batch-size") +.displayName("Bulk Size") +.description("Specifies batch size for INSERT and UPDATE statements. This parameter has no effect for other statements specified in 'Statement Type'." ++ " Non-positive value has the effect of infinite bulk size.") +.defaultValue("-1") --- End diff -- I'll change the default to be zero and the validator to NONNEGATIVE_INTEGER_VALIDATOR > Introduce batch size limit in PutDatabaseRecord processor > - > > Key: NIFI-5788 > URL: https://issues.apache.org/jira/browse/NIFI-5788 > Project: Apache NiFi > Issue Type: Improvement > Components: Core Framework > Environment: Teradata DB >Reporter: Vadim >Priority: Major > Labels: pull-request-available > > Certain JDBC drivers do not support unlimited batch size in INSERT/UPDATE > prepared SQL statements. Specifically, Teradata JDBC driver > ([https://downloads.teradata.com/download/connectivity/jdbc-driver)] would > fail SQL statement when the batch overflows the internal limits. > Dividing data into smaller chunks before the PutDatabaseRecord is applied can > work around the issue in certain scenarios, but generally, this solution is > not perfect because the SQL statements would be executed in different > transaction contexts and data integrity would not be preserved. > The solution suggests the following: > * introduce a new optional parameter in *PutDatabaseRecord* processor, > *batch_size* which defines the maximum size of the bulk in INSERT/UPDATE > statement; its default value is -1 (INFINITY) preserves the old behavior > * divide the input into batches of the specified size and invoke > PreparedStatement.executeBatch() for each batch > Pull request: [https://github.com/apache/nifi/pull/3128] > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (NIFI-5788) Introduce batch size limit in PutDatabaseRecord processor
[ https://issues.apache.org/jira/browse/NIFI-5788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16676638#comment-16676638 ] ASF GitHub Bot commented on NIFI-5788: -- Github user vadimar commented on a diff in the pull request: https://github.com/apache/nifi/pull/3128#discussion_r231086664 --- Diff: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/PutDatabaseRecord.java --- @@ -265,6 +265,17 @@ .expressionLanguageSupported(ExpressionLanguageScope.VARIABLE_REGISTRY) .build(); +static final PropertyDescriptor BATCH_SIZE = new PropertyDescriptor.Builder() --- End diff -- Agree regarding "Maximum Batch Size". Sounds better. What's "bulk size"? Is it relevant to this change? > Introduce batch size limit in PutDatabaseRecord processor > - > > Key: NIFI-5788 > URL: https://issues.apache.org/jira/browse/NIFI-5788 > Project: Apache NiFi > Issue Type: Improvement > Components: Core Framework > Environment: Teradata DB >Reporter: Vadim >Priority: Major > Labels: pull-request-available > > Certain JDBC drivers do not support unlimited batch size in INSERT/UPDATE > prepared SQL statements. Specifically, Teradata JDBC driver > ([https://downloads.teradata.com/download/connectivity/jdbc-driver)] would > fail SQL statement when the batch overflows the internal limits. > Dividing data into smaller chunks before the PutDatabaseRecord is applied can > work around the issue in certain scenarios, but generally, this solution is > not perfect because the SQL statements would be executed in different > transaction contexts and data integrity would not be preserved. > The solution suggests the following: > * introduce a new optional parameter in *PutDatabaseRecord* processor, > *batch_size* which defines the maximum size of the bulk in INSERT/UPDATE > statement; its default value is -1 (INFINITY) preserves the old behavior > * divide the input into batches of the specified size and invoke > PreparedStatement.executeBatch() for each batch > Pull request: [https://github.com/apache/nifi/pull/3128] > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (NIFI-5788) Introduce batch size limit in PutDatabaseRecord processor
[ https://issues.apache.org/jira/browse/NIFI-5788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16675777#comment-16675777 ] ASF GitHub Bot commented on NIFI-5788: -- Github user patricker commented on a diff in the pull request: https://github.com/apache/nifi/pull/3128#discussion_r230917511 --- Diff: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/PutDatabaseRecord.java --- @@ -669,11 +685,20 @@ private void executeDML(ProcessContext context, ProcessSession session, FlowFile } } ps.addBatch(); +if (++currentBatchSize == batchSize) { --- End diff -- Would it be beneficial to capture `currentBatchSize*batchIndex`, with `batchIndex` being incremented only after a successful call to `executeBatch()` as an attribute? My thinking is, if you have a failure, and only part of a batch was loaded, you could store how many rows were loaded successfully as an attribute? > Introduce batch size limit in PutDatabaseRecord processor > - > > Key: NIFI-5788 > URL: https://issues.apache.org/jira/browse/NIFI-5788 > Project: Apache NiFi > Issue Type: Improvement > Components: Core Framework > Environment: Teradata DB >Reporter: Vadim >Priority: Major > Labels: pull-request-available > > Certain JDBC drivers do not support unlimited batch size in INSERT/UPDATE > prepared SQL statements. Specifically, Teradata JDBC driver > ([https://downloads.teradata.com/download/connectivity/jdbc-driver)] would > fail SQL statement when the batch overflows the internal limits. > Dividing data into smaller chunks before the PutDatabaseRecord is applied can > work around the issue in certain scenarios, but generally, this solution is > not perfect because the SQL statements would be executed in different > transaction contexts and data integrity would not be preserved. > The solution suggests the following: > * introduce a new optional parameter in *PutDatabaseRecord* processor, > *batch_size* which defines the maximum size of the bulk in INSERT/UPDATE > statement; its default value is -1 (INFINITY) preserves the old behavior > * divide the input into batches of the specified size and invoke > PreparedStatement.executeBatch() for each batch > Pull request: [https://github.com/apache/nifi/pull/3128] > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (NIFI-5788) Introduce batch size limit in PutDatabaseRecord processor
[ https://issues.apache.org/jira/browse/NIFI-5788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16675775#comment-16675775 ] ASF GitHub Bot commented on NIFI-5788: -- Github user patricker commented on a diff in the pull request: https://github.com/apache/nifi/pull/3128#discussion_r230916140 --- Diff: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/PutDatabaseRecord.java --- @@ -265,6 +265,17 @@ .expressionLanguageSupported(ExpressionLanguageScope.VARIABLE_REGISTRY) .build(); +static final PropertyDescriptor BATCH_SIZE = new PropertyDescriptor.Builder() +.name("put-db-record-batch-size") +.displayName("Bulk Size") +.description("Specifies batch size for INSERT and UPDATE statements. This parameter has no effect for other statements specified in 'Statement Type'." ++ " Non-positive value has the effect of infinite bulk size.") +.defaultValue("-1") --- End diff -- I agree that `0` should be the default, and would replicate the current behavior of the processor, "All records in one batch". > Introduce batch size limit in PutDatabaseRecord processor > - > > Key: NIFI-5788 > URL: https://issues.apache.org/jira/browse/NIFI-5788 > Project: Apache NiFi > Issue Type: Improvement > Components: Core Framework > Environment: Teradata DB >Reporter: Vadim >Priority: Major > Labels: pull-request-available > > Certain JDBC drivers do not support unlimited batch size in INSERT/UPDATE > prepared SQL statements. Specifically, Teradata JDBC driver > ([https://downloads.teradata.com/download/connectivity/jdbc-driver)] would > fail SQL statement when the batch overflows the internal limits. > Dividing data into smaller chunks before the PutDatabaseRecord is applied can > work around the issue in certain scenarios, but generally, this solution is > not perfect because the SQL statements would be executed in different > transaction contexts and data integrity would not be preserved. > The solution suggests the following: > * introduce a new optional parameter in *PutDatabaseRecord* processor, > *batch_size* which defines the maximum size of the bulk in INSERT/UPDATE > statement; its default value is -1 (INFINITY) preserves the old behavior > * divide the input into batches of the specified size and invoke > PreparedStatement.executeBatch() for each batch > Pull request: [https://github.com/apache/nifi/pull/3128] > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (NIFI-5788) Introduce batch size limit in PutDatabaseRecord processor
[ https://issues.apache.org/jira/browse/NIFI-5788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16675386#comment-16675386 ] ASF GitHub Bot commented on NIFI-5788: -- Github user mattyb149 commented on a diff in the pull request: https://github.com/apache/nifi/pull/3128#discussion_r230812123 --- Diff: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/PutDatabaseRecord.java --- @@ -265,6 +265,17 @@ .expressionLanguageSupported(ExpressionLanguageScope.VARIABLE_REGISTRY) .build(); +static final PropertyDescriptor BATCH_SIZE = new PropertyDescriptor.Builder() +.name("put-db-record-batch-size") +.displayName("Bulk Size") +.description("Specifies batch size for INSERT and UPDATE statements. This parameter has no effect for other statements specified in 'Statement Type'." ++ " Non-positive value has the effect of infinite bulk size.") +.defaultValue("-1") --- End diff -- What does a value of zero do? Would anyone ever use it? If not, perhaps zero is the best default to indicate infinite bulk size. If you do change it to zero, please change the validator to a NONNEGATIVE_INTEGER_VALIDATOR to match > Introduce batch size limit in PutDatabaseRecord processor > - > > Key: NIFI-5788 > URL: https://issues.apache.org/jira/browse/NIFI-5788 > Project: Apache NiFi > Issue Type: Improvement > Components: Core Framework > Environment: Teradata DB >Reporter: Vadim >Priority: Major > Labels: pull-request-available > > Certain JDBC drivers do not support unlimited batch size in INSERT/UPDATE > prepared SQL statements. Specifically, Teradata JDBC driver > ([https://downloads.teradata.com/download/connectivity/jdbc-driver)] would > fail SQL statement when the batch overflows the internal limits. > Dividing data into smaller chunks before the PutDatabaseRecord is applied can > work around the issue in certain scenarios, but generally, this solution is > not perfect because the SQL statements would be executed in different > transaction contexts and data integrity would not be preserved. > The solution suggests the following: > * introduce a new optional parameter in *PutDatabaseRecord* processor, > *batch_size* which defines the maximum size of the bulk in INSERT/UPDATE > statement; its default value is -1 (INFINITY) preserves the old behavior > * divide the input into batches of the specified size and invoke > PreparedStatement.executeBatch() for each batch > Pull request: [https://github.com/apache/nifi/pull/3128] > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (NIFI-5788) Introduce batch size limit in PutDatabaseRecord processor
[ https://issues.apache.org/jira/browse/NIFI-5788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16675387#comment-16675387 ] ASF GitHub Bot commented on NIFI-5788: -- Github user mattyb149 commented on a diff in the pull request: https://github.com/apache/nifi/pull/3128#discussion_r230811717 --- Diff: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/PutDatabaseRecord.java --- @@ -265,6 +265,17 @@ .expressionLanguageSupported(ExpressionLanguageScope.VARIABLE_REGISTRY) .build(); +static final PropertyDescriptor BATCH_SIZE = new PropertyDescriptor.Builder() --- End diff -- We should be consistent here with "batch size" and "bulk size" in the naming of variables, documentation, etc. Maybe "Maximum Batch Size"? > Introduce batch size limit in PutDatabaseRecord processor > - > > Key: NIFI-5788 > URL: https://issues.apache.org/jira/browse/NIFI-5788 > Project: Apache NiFi > Issue Type: Improvement > Components: Core Framework > Environment: Teradata DB >Reporter: Vadim >Priority: Major > Labels: pull-request-available > > Certain JDBC drivers do not support unlimited batch size in INSERT/UPDATE > prepared SQL statements. Specifically, Teradata JDBC driver > ([https://downloads.teradata.com/download/connectivity/jdbc-driver)] would > fail SQL statement when the batch overflows the internal limits. > Dividing data into smaller chunks before the PutDatabaseRecord is applied can > work around the issue in certain scenarios, but generally, this solution is > not perfect because the SQL statements would be executed in different > transaction contexts and data integrity would not be preserved. > The solution suggests the following: > * introduce a new optional parameter in *PutDatabaseRecord* processor, > *batch_size* which defines the maximum size of the bulk in INSERT/UPDATE > statement; its default value is -1 (INFINITY) preserves the old behavior > * divide the input into batches of the specified size and invoke > PreparedStatement.executeBatch() for each batch > Pull request: [https://github.com/apache/nifi/pull/3128] > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (NIFI-5788) Introduce batch size limit in PutDatabaseRecord processor
[ https://issues.apache.org/jira/browse/NIFI-5788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16675072#comment-16675072 ] ASF GitHub Bot commented on NIFI-5788: -- GitHub user vadimar opened a pull request: https://github.com/apache/nifi/pull/3128 NIFI-5788: Introduce batch size limit in PutDatabaseRecord processor Thank you for submitting a contribution to Apache NiFi. In order to streamline the review of the contribution we ask you to ensure the following steps have been taken: ### For all changes: - [ ] Is there a JIRA ticket associated with this PR? Is it referenced in the commit message? - [ ] Does your PR title start with NIFI- where is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character. - [ ] Has your PR been rebased against the latest commit within the target branch (typically master)? - [ ] Is your initial contribution a single, squashed commit? ### For code changes: - [ ] Have you ensured that the full suite of tests is executed via mvn -Pcontrib-check clean install at the root nifi folder? - [ ] Have you written or updated unit tests to verify your changes? - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)? - [ ] If applicable, have you updated the LICENSE file, including the main LICENSE file under nifi-assembly? - [ ] If applicable, have you updated the NOTICE file, including the main NOTICE file found under nifi-assembly? - [ ] If adding new Properties, have you added .displayName in addition to .name (programmatic access) for each of the new properties? ### For documentation related changes: - [ ] Have you ensured that format looks appropriate for the output in which it is rendered? ### Note: Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible. You can merge this pull request into a Git repository by running: $ git pull https://github.com/vadimar/nifi-1 nifi-5788 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/nifi/pull/3128.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3128 commit 2f36c8b1a732e249238f5f6f53968e84c05b497c Author: vadimar Date: 2018-11-05T11:15:12Z NIFI-5788: Introduce batch size limit in PutDatabaseRecord processor > Introduce batch size limit in PutDatabaseRecord processor > - > > Key: NIFI-5788 > URL: https://issues.apache.org/jira/browse/NIFI-5788 > Project: Apache NiFi > Issue Type: Bug > Components: Core Framework >Affects Versions: 1.8.0 > Environment: Teradata DB >Reporter: Vadim >Priority: Major > Labels: pull-request-available > Fix For: 1.8.0 > > > Certain JDBC drivers do not support unlimited batch size in INSERT/UPDATE > prepared SQL statements. Specifically, Teradata JDBC driver > ([https://downloads.teradata.com/download/connectivity/jdbc-driver)] would > fail SQL statement when the batch overflows the internal limits. > Dividing data into smaller chunks before the PutDatabaseRecord is applied can > work around the issue in certain scenarios, but generally, this solution is > not perfect because the SQL statements would be executed in different > transaction contexts and data integrity would not be preserved. > The solution suggests the following: > * introduce a new optional parameter in *PutDatabaseRecord* processor, > *batch_size* which defines the maximum size of the bulk in INSERT/UPDATE > statement; its default value is -1 (INFINITY) preserves the old behavior > * divide the input into batches of the specified size and invoke > PreparedStatement.executeBatch() for each batch -- This message was sent by Atlassian JIRA (v7.6.3#76005)