[jira] [Commented] (NIFI-12700) PutKudu memory optimization for unbatched flush mode (AUTO_FLUSH_SYNC)

2024-03-15 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/NIFI-12700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17827538#comment-17827538
 ] 

ASF subversion and git services commented on NIFI-12700:


Commit 3719fddf84ffcf18ed29d87ee931aa1acc1699d3 in nifi's branch 
refs/heads/main from emiliosetiadarma
[ https://gitbox.apache.org/repos/asf?p=nifi.git;h=3719fddf84 ]

NIFI-12700: refactored PutKudu to optimize memory handling for AUTO_FLUSH_SYNC 
flush mode (unbatched flush)

NIFI-12700: made changes based on PR comments. Simplified statements involving 
determination of whether or not there are flowfile failures/rowErrors. 
Separated out getting rowErrors from OperationResponses into its own function

Signed-off-by: Matt Burgess 

This closes #8322


> PutKudu memory optimization for unbatched flush mode (AUTO_FLUSH_SYNC)
> --
>
> Key: NIFI-12700
> URL: https://issues.apache.org/jira/browse/NIFI-12700
> Project: Apache NiFi
>  Issue Type: Improvement
>Reporter: Emilio Setiadarma
>Assignee: Emilio Setiadarma
>Priority: Major
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> The PutKudu processor's existing implementation uses a Map of KuduOperation 
> -> FlowFile  to keep track of which FlowFile was processing when the 
> KuduOperation was created. This is mapping is eventually used to associate 
> FlowFiles with the RowError (if any occurs), a mapping that is necessary for 
> transferring FlowFiles to success/failure relationships or logging failures 
> among other things. 
> For very large inputs, Kudu Operation objects can grow very large. There is 
> no memory leak, but still could cause OutOfMemory issues in very large input 
> data. There is a possibility to not require the use of a KuduOperation -> 
> FlowFile map for unbatched flush modes (e.g. when using the AUTO_FLUSH_SYNC 
> flush mode, where the KuduSession.apply() would have already flushed the 
> buffer before returning, 
> [https://kudu.apache.org/apidocs/org/apache/kudu/client/SessionConfiguration.FlushMode.html)|https://kudu.apache.org/apidocs/org/apache/kudu/client/SessionConfiguration.FlushMode.html]
> This Jira attempts to capture the efforts for refactoring PutKudu processor 
> to make it more memory optimized.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (NIFI-12700) PutKudu memory optimization for unbatched flush mode (AUTO_FLUSH_SYNC)

2024-03-15 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/NIFI-12700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17827537#comment-17827537
 ] 

ASF subversion and git services commented on NIFI-12700:


Commit 37eb52d75fdcfe57104b5ec5f5db56ac870c8581 in nifi's branch 
refs/heads/support/nifi-1.x from emiliosetiadarma
[ https://gitbox.apache.org/repos/asf?p=nifi.git;h=37eb52d75f ]

NIFI-12700: refactored PutKudu to optimize memory handling for AUTO_FLUSH_SYNC 
flush mode (unbatched flush)

NIFI-12700: made changes based on PR comments. Simplified statements involving 
determination of whether or not there are flowfile failures/rowErrors. 
Separated out getting rowErrors from OperationResponses into its own function

Signed-off-by: Matt Burgess 

This closes #8501


> PutKudu memory optimization for unbatched flush mode (AUTO_FLUSH_SYNC)
> --
>
> Key: NIFI-12700
> URL: https://issues.apache.org/jira/browse/NIFI-12700
> Project: Apache NiFi
>  Issue Type: Improvement
>Reporter: Emilio Setiadarma
>Assignee: Emilio Setiadarma
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> The PutKudu processor's existing implementation uses a Map of KuduOperation 
> -> FlowFile  to keep track of which FlowFile was processing when the 
> KuduOperation was created. This is mapping is eventually used to associate 
> FlowFiles with the RowError (if any occurs), a mapping that is necessary for 
> transferring FlowFiles to success/failure relationships or logging failures 
> among other things. 
> For very large inputs, Kudu Operation objects can grow very large. There is 
> no memory leak, but still could cause OutOfMemory issues in very large input 
> data. There is a possibility to not require the use of a KuduOperation -> 
> FlowFile map for unbatched flush modes (e.g. when using the AUTO_FLUSH_SYNC 
> flush mode, where the KuduSession.apply() would have already flushed the 
> buffer before returning, 
> [https://kudu.apache.org/apidocs/org/apache/kudu/client/SessionConfiguration.FlushMode.html)|https://kudu.apache.org/apidocs/org/apache/kudu/client/SessionConfiguration.FlushMode.html]
> This Jira attempts to capture the efforts for refactoring PutKudu processor 
> to make it more memory optimized.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)