Re: [PR] NIFI-12700: refactored PutKudu to optimize memory handling for AUTO_F… [nifi]

2024-03-15 Thread via GitHub


mattyb149 closed pull request #8322: NIFI-12700: refactored PutKudu to optimize 
memory handling for AUTO_F…
URL: https://github.com/apache/nifi/pull/8322


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@nifi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] NIFI-12700: refactored PutKudu to optimize memory handling for AUTO_F… [nifi]

2024-03-15 Thread via GitHub


mattyb149 commented on PR #8501:
URL: https://github.com/apache/nifi/pull/8501#issuecomment-143606

   Closing as merged


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@nifi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] NIFI-12700: refactored PutKudu to optimize memory handling for AUTO_F… [nifi]

2024-03-15 Thread via GitHub


mattyb149 closed pull request #8501: NIFI-12700: refactored PutKudu to optimize 
memory handling for AUTO_F…
URL: https://github.com/apache/nifi/pull/8501


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@nifi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] NIFI-12700: refactored PutKudu to optimize memory handling for AUTO_F… [nifi]

2024-03-15 Thread via GitHub


mattyb149 commented on PR #8322:
URL: https://github.com/apache/nifi/pull/8322#issuecomment-141459

   Verified the requested changes were made. +1 LGTM, thanks for the 
improvement! Merging to main


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@nifi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] NIFI-12700: refactored PutKudu to optimize memory handling for AUTO_F… [nifi]

2024-03-15 Thread via GitHub


mattyb149 commented on PR #8501:
URL: https://github.com/apache/nifi/pull/8501#issuecomment-140400

   Verified the requested changes were made. +1 LGTM, thanks for the 
improvement! Merging to support/nifi-1.x


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@nifi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] NIFI-12700: refactored PutKudu to optimize memory handling for AUTO_F… [nifi]

2024-03-14 Thread via GitHub


emiliosetiadarma commented on code in PR #8501:
URL: https://github.com/apache/nifi/pull/8501#discussion_r1525379676


##
nifi-nar-bundles/nifi-kudu-bundle/nifi-kudu-processors/src/main/java/org/apache/nifi/processors/kudu/StandardPutKuduResult.java:
##
@@ -0,0 +1,87 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.nifi.processors.kudu;
+
+import org.apache.kudu.client.Operation;
+import org.apache.kudu.client.RowError;
+import org.apache.nifi.flowfile.FlowFile;
+
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.stream.Collectors;
+
+public class StandardPutKuduResult extends PutKuduResult {
+private final Map operationFlowFileMap;
+private final List pendingRowErrors;
+private final Map> flowFileRowErrorsMap;
+
+public StandardPutKuduResult() {
+super();
+this.operationFlowFileMap = new HashMap<>();
+this.pendingRowErrors = new ArrayList<>();
+this.flowFileRowErrorsMap = new HashMap<>();
+}
+
+@Override
+public void recordOperation(final Operation operation) {
+operationFlowFileMap.put(operation, flowFile);
+}
+
+@Override
+public void addError(final RowError rowError) {
+// When this class is used to store results from processing FlowFiles, 
the FlushMode
+// is set to AUTO_FLUSH_BACKGROUND or MANUAL_FLUSH. In either case, we 
won't know which
+// FlowFile/Record we are currently processing as the RowErrors are 
obtained from the KuduSession
+// post-processing of the FlowFile/Record
+this.pendingRowErrors.add(rowError);
+}
+
+@Override
+public void resolveFlowFileToRowErrorAssociations() {
+flowFileRowErrorsMap.putAll(pendingRowErrors.stream()
+.filter(e -> operationFlowFileMap.get(e.getOperation()) != 
null)
+.collect(
+Collectors.groupingBy(e -> 
operationFlowFileMap.get(e.getOperation()))
+)
+);
+
+pendingRowErrors.clear();
+}
+
+@Override
+public boolean hasRowErrorsOrFailures() {
+if (!flowFileFailures.isEmpty()) {
+return true;
+}
+
+for (final Map.Entry> entry : 
flowFileRowErrorsMap.entrySet()) {
+if (!entry.getValue().isEmpty()) {
+return true;
+}
+}
+
+return false;

Review Comment:
   Will make the change



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@nifi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] NIFI-12700: refactored PutKudu to optimize memory handling for AUTO_F… [nifi]

2024-03-14 Thread via GitHub


emiliosetiadarma commented on code in PR #8501:
URL: https://github.com/apache/nifi/pull/8501#discussion_r1525376745


##
nifi-nar-bundles/nifi-kudu-bundle/nifi-kudu-processors/src/main/java/org/apache/nifi/processors/kudu/PutKuduResult.java:
##
@@ -0,0 +1,148 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.nifi.processors.kudu;
+
+import org.apache.kudu.client.Operation;
+import org.apache.kudu.client.RowError;
+import org.apache.nifi.flowfile.FlowFile;
+import org.apache.nifi.serialization.record.Record;
+
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+
+public abstract class PutKuduResult {
+protected FlowFile flowFile;
+protected final Map flowFileFailures;
+private final Set processedFlowFiles;
+private final Map processedRecords;
+
+public PutKuduResult() {
+this.flowFile = null;
+
+this.flowFileFailures = new HashMap<>();
+this.processedFlowFiles = new HashSet<>();
+this.processedRecords = new HashMap<>();
+}
+
+public void setFlowFile(final FlowFile flowFile) {
+this.flowFile = flowFile;
+processedFlowFiles.add(flowFile);
+}
+
+public Set getProcessedFlowFiles() {
+return this.processedFlowFiles;
+}
+
+public int getProcessedRecordsForFlowFile(final FlowFile flowFile) {
+return this.processedRecords.getOrDefault(flowFile, 0);
+}
+
+/**
+ * Increments the number of {@link Record}s that has been successfully 
processed for this {@link FlowFile}
+ */
+public void incrementProcessedRecordsForFlowFile() {
+final int newCount = this.processedRecords.getOrDefault(flowFile, 0) + 
1;
+this.processedRecords.put(flowFile, newCount);
+}
+
+/**
+ * Records an {@link Operation} being processed for a specific {@link 
FlowFile}
+ * @param operation the {@link Operation} to record
+ */
+public abstract void recordOperation(final Operation operation);
+
+/**
+ * Records a {@link RowError} for the particular {@link FlowFile} that's 
being processed
+ * @param rowError the {@link RowError} to add
+ */
+public abstract void addError(final RowError rowError);
+
+/**
+ * Records a {@link List} of {@link RowError}s for the particular {@link 
FlowFile} that's being processed
+ * @param rowErrors the {@link List} of {@link RowError}s to add
+ */
+public void addErrors(final List rowErrors) {
+for (final RowError rowError : rowErrors) {
+addError(rowError);
+}
+}
+
+/**
+ * Records a failure (an {@link Exception} or a {@link RowError}) for the 
particular {@link FlowFile} that's being processed.
+ * A failure is defined as anything that stops the processing of the 
records in a {@link FlowFile}
+ * @param failure the {@link Exception} or {@link RowError} to add
+ */
+public void addFailure(final Object failure) {
+if (flowFileFailures.containsKey(flowFile)) {
+throw new IllegalStateException("A failure has already previously 
occurred while processing FlowFile.");
+}
+flowFileFailures.put(flowFile, failure);
+}
+
+
+/**
+ * Resolves the associations between {@link FlowFile} and the {@link 
RowError}s that occurred
+ * while processing them. This is only applicable in batch sesssion 
flushes, namely when
+ * using the {@code SessionConfiguration.FlushMode.AUTO_FLUSH_BACKGROUND} 
and
+ * {@code SessionConfiguration.FlushMode.MANUAL_FLUSH} flush modes. 
Otherwise, this
+ * function should be a no-op. This function should only be called once 
finished with processing
+ * all {@link FlowFile}s in a batch.
+ */
+public void resolveFlowFileToRowErrorAssociations() {
+return;
+}
+
+/**
+ * Checks whether there was a failure (i.e. either an {@link Exception} or 
{@link RowError} that happened during processing)
+ * @return {@code true} if there was a {@link Exception} or a {@link 
RowError} that happened during processing, {@code false} otherwise
+ */
+public abstract boolean 

Re: [PR] NIFI-12700: refactored PutKudu to optimize memory handling for AUTO_F… [nifi]

2024-03-14 Thread via GitHub


dan-s1 commented on code in PR #8501:
URL: https://github.com/apache/nifi/pull/8501#discussion_r1525017249


##
nifi-nar-bundles/nifi-kudu-bundle/nifi-kudu-processors/src/main/java/org/apache/nifi/processors/kudu/AbstractKuduProcessor.java:
##
@@ -361,6 +363,27 @@ protected Collection 
customValidate(ValidationContext context)
 return results;
 }
 
+protected List flushKuduSession(final KuduSession kuduSession) 
throws KuduException {
+final List responses = kuduSession.flush();
+// RowErrors will only be present in the OperationResponses in this 
case if the flush mode
+// selected is MANUAL_FLUSH. It will be empty otherwise.
+return responses.stream()
+.filter(OperationResponse::hasRowError)
+.map(OperationResponse::getRowError)
+.collect(Collectors.toList());
+}
+
+protected List closeKuduSession(final KuduSession kuduSession) 
throws KuduException {
+final List responses = kuduSession.close();
+// RowErrors will only be present in the OperationResponses in this 
case if the flush mode
+// selected is MANUAL_FLUSH, since the underlying implementation of 
kuduSession.close() returns
+// the OperationResponses from a flush() call.
+return responses.stream()
+.filter(OperationResponse::hasRowError)
+.map(OperationResponse::getRowError)
+.collect(Collectors.toList());
+}

Review Comment:
   The only difference in these methods is the source of the responses not how 
the errors are collected. Refactor how the errors are collected into a common 
method.
   ```suggestion
   protected List flushKuduSession(final KuduSession kuduSession) 
throws KuduException {
   final List responses = kuduSession.flush();
   // RowErrors will only be present in the OperationResponses in this 
case if the flush mode
   // selected is MANUAL_FLUSH. It will be empty otherwise.
   return getRowErrors(responses);
   }
   
   protected List closeKuduSession(final KuduSession kuduSession) 
throws KuduException {
   final List responses = kuduSession.close();
   // RowErrors will only be present in the OperationResponses in this 
case if the flush mode
   // selected is MANUAL_FLUSH, since the underlying implementation of 
kuduSession.close() returns
   // the OperationResponses from a flush() call.
   return getRowErrors(responses);
   }
   
   private List getRowErrors(List responses) {
 return responses.stream()
   .filter(OperationResponse::hasRowError)
   .map(OperationResponse::getRowError)
   .collect(Collectors.toList());
   }
   ```



##
nifi-nar-bundles/nifi-kudu-bundle/nifi-kudu-processors/src/main/java/org/apache/nifi/processors/kudu/StandardPutKuduResult.java:
##
@@ -0,0 +1,87 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.nifi.processors.kudu;
+
+import org.apache.kudu.client.Operation;
+import org.apache.kudu.client.RowError;
+import org.apache.nifi.flowfile.FlowFile;
+
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.stream.Collectors;
+
+public class StandardPutKuduResult extends PutKuduResult {
+private final Map operationFlowFileMap;
+private final List pendingRowErrors;
+private final Map> flowFileRowErrorsMap;
+
+public StandardPutKuduResult() {
+super();
+this.operationFlowFileMap = new HashMap<>();
+this.pendingRowErrors = new ArrayList<>();
+this.flowFileRowErrorsMap = new HashMap<>();
+}
+
+@Override
+public void recordOperation(final Operation operation) {
+operationFlowFileMap.put(operation, flowFile);
+}
+
+@Override
+public void addError(final RowError rowError) {
+// When this class is used to store results from processing FlowFiles, 
the FlushMode
+// is set to AUTO_FLUSH_BACKGROUND or MANUAL_FLUSH. In either case, we 
won't know which
+

[PR] NIFI-12700: refactored PutKudu to optimize memory handling for AUTO_F… [nifi]

2024-03-13 Thread via GitHub


emiliosetiadarma opened a new pull request, #8501:
URL: https://github.com/apache/nifi/pull/8501

   …LUSH_SYNC flush mode (unbatched flush)
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   # Summary
   
   [NIFI-12700](https://issues.apache.org/jira/browse/NIFI-12700)
   
   # Tracking
   
   Please complete the following tracking steps prior to pull request creation.
   
   ### Issue Tracking
   
   - [x] [Apache NiFi Jira](https://issues.apache.org/jira/browse/NIFI-12700) 
issue created
   
   ### Pull Request Tracking
   
   - [x] Pull Request title starts with Apache NiFi Jira issue number, such as 
`NIFI-0`
   - [x] Pull Request commit message starts with Apache NiFi Jira issue number, 
as such `NIFI-0`
   
   ### Pull Request Formatting
   
   - [ ] Pull Request based on current revision of the `main` branch
   - [ ] Pull Request refers to a feature branch with one commit containing 
changes
   
   # Verification
   
   Please indicate the verification steps performed prior to pull request 
creation.
   
   ### Build
   
   - [x] Build completed using `mvn clean install -P contrib-check`
 - [x] JDK 21
   
   ### Licensing
   
   - [ ] New dependencies are compatible with the [Apache License 
2.0](https://apache.org/licenses/LICENSE-2.0) according to the [License 
Policy](https://www.apache.org/legal/resolved.html)
   - [ ] New dependencies are documented in applicable `LICENSE` and `NOTICE` 
files
   
   ### Documentation
   
   - [ ] Documentation formatting appears as expected in rendered files
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@nifi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] NIFI-12700: refactored PutKudu to optimize memory handling for AUTO_F… [nifi]

2024-03-13 Thread via GitHub


mattyb149 commented on PR #8322:
URL: https://github.com/apache/nifi/pull/8322#issuecomment-1995644417

   Reviewing...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@nifi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org