[jira] [Commented] (NIFI-8760) VolatileContentRepository fails to retrieve content from claims with several processors

2021-10-28 Thread Joe Witt (Jira)


[ 
https://issues.apache.org/jira/browse/NIFI-8760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17435488#comment-17435488
 ] 

Joe Witt commented on NIFI-8760:


Removed fix version until progress on review/discussion is made.

> VolatileContentRepository fails to retrieve content from claims with several 
> processors
> ---
>
> Key: NIFI-8760
> URL: https://issues.apache.org/jira/browse/NIFI-8760
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Core Framework
>Affects Versions: 1.14.0, 1.13.1, 1.13.2, 1.15.0
>Reporter: Matthieu RÉ
>Priority: Major
>  Labels: content-repository, volatile
> Attachments: 
> 0001-fix-2-set-VolatileContentRepository-as-non-supportiv.patch, 
> 0001-fix-2-set-VolatileContentRepository-as-non-supportiv.patch, flow.xml.gz, 
> nifi.properties
>
>
> For several processors such as MergeRecord, QueryRecord, SplitJson, the use 
> of VolatileContentRepository implementation infers errors while retrieving 
> Flowfiles from claims. The following logs are generated using NiFi 1.13.1 
> from Docker and the flow.xml.gz and nifi.properties file attached.
> MergeRecord (with JsonTreeReader, JsonRecordSetWriter with default 
> configuration):
> {{2021-07-06 10:15:09,170 ERROR [Timer-Driven Process Thread-1] 
> o.a.nifi.processors.standard.MergeRecord 
> MergeRecord[id=7b425cff-017a-1000-6a20-58c4e064df3d] Failed to bin 
> StandardFlowFileRecord[uuid=3e894a96-883a-4ac2-8121-b8200964cf20,claim=StandardContentClaim
>  [resourceClaim=StandardResourceClaim[id=6, container=in-memory, 
> section=section], offset=0, 
> length=5655],offset=0,name=b2c7cf61-b421-477d-902e-daeb2ed58f0d,size=5655] 
> due to org.apache.nifi.controller.repository.ContentNotFoundException: Could 
> not find content for StandardContentClaim 
> [resourceClaim=StandardResourceClaim[id=6, container=in-memory, 
> section=section], offset=0, length=-1]: 
> org.apache.nifi.controller.repository.ContentNotFoundException: Could not 
> find content for StandardContentClaim 
> [resourceClaim=StandardResourceClaim[id=6, container=in-memory, 
> section=section], offset=0, length=-1]}}
>  {{org.apache.nifi.controller.repository.ContentNotFoundException: Could not 
> find content for StandardContentClaim 
> [resourceClaim=StandardResourceClaim[id=6, container=in-memory, 
> section=section], offset=0, length=-1]}}
>  {{at 
> org.apache.nifi.controller.repository.VolatileContentRepository.getContent(VolatileContentRepository.java:445)}}
>  {{at 
> org.apache.nifi.controller.repository.VolatileContentRepository.read(VolatileContentRepository.java:468)}}
>  {{at 
> org.apache.nifi.controller.repository.VolatileContentRepository.read(VolatileContentRepository.java:473)}}
>  {{at 
> org.apache.nifi.controller.repository.StandardProcessSession.getInputStream(StandardProcessSession.java:2302)}}
>  {{at 
> org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2409)}}
>  {{at 
> org.apache.nifi.processors.standard.MergeRecord.binFlowFile(MergeRecord.java:383)}}
>  {{at 
> org.apache.nifi.processors.standard.MergeRecord.onTrigger(MergeRecord.java:346)}}
>  {{at 
> org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1173)}}
>  {{at 
> org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:214)}}
>  {{at 
> org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117)}}
>  {{at org.apache.nifi.engine.FlowEngine$2.run(FlowEngine.java:110)}}
>  {{at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)}}
>  {{at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)}}
>  {{at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)}}
>  {{at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)}}
>  {{at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)}}
>  {{at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)}}
>  {{at java.lang.Thread.run(Thread.java:748)}}
> QueryRecord:
> {{2021-07-06 10:15:09,174 ERROR [Timer-Driven Process Thread-4] 
> o.a.nifi.processors.standard.QueryRecord 
> QueryRecord[id=673fe9f6-017a-1000-8041-dfde9d02d976] Failed to determine 
> Record Schema from 
> StandardFlowFileRecord[uuid=090e3058-67e6-4436-bea9-d511132848e3,claim=StandardContentClaim
>  [resourceClaim=StandardResourceClaim[id=2, container=in-memory, 
> section=section], offset=0, 
> length=5655],offset=0,name=090e3058-67e6-4436-bea9-d511132848e3,size=5655]; 
> routing to failure: 
> org.apache.nifi.controller.repository.ContentNotFoundException: Could 

[jira] [Commented] (NIFI-8760) VolatileContentRepository fails to retrieve content from claims with several processors

2021-09-30 Thread Jira


[ 
https://issues.apache.org/jira/browse/NIFI-8760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17422674#comment-17422674
 ] 

Matthieu RÉ commented on NIFI-8760:
---

Today I have two simple fixes equivalent in terms of performance (tested on 
GenerateFF and MergeRecord, SplitJson, QueryRecord) :


 * First is to follow [the idea of the first 
implementation|https://github.com/apache/nifi/blob/528fce2407d092d4ced1a58fcc14d0bc6e660b89/nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-framework-core/src/main/java/org/apache/nifi/controller/repository/VolatileContentRepository.java#L473],
 that was for a ResourceClaim to call the corresponding ContentClaim at the 
offset 0. It doesn't work when the searched ContentClaim has a length, because 
the ContentClaim implements an "equalsTo" that takes the length into account 
and its constructor called by read(ResourceClaim) initializes it to -1. So a 
fix could be to search for the ContentClaim in the map matching the 
ResourceClaim and the offset 0.
As I said, even if this implementation seems poor since it does not benefit 
from the structure of the Map of Comparable keys to search for a ContentClaim, 
the performance of this solution seems equivalent to the second one.
 * Second is to simply consider the VolatileContentRepository as non-compatible 
with the read(ResourceClaim) and to only allow read(ContentClaim) as it is the 
case for the EncryptedFileSystemRepository.

Since the structure of the data storage(s) in this implementation is 
Map, I lake of experience to answer the question :
 * Does it make sense to try to use the ResourceClaim to call ContentBlock(s) 
in case of a VolatileContentRepository ?
 * If yes, could there be a benefit to call ContentBlock from all the offset 
matching the ResourceClaim, instead of only the offset 0 as it intended to be ?
 * Else, the second fix is probably the good one

Please don't hesitate to correct me if I'm wrong or misunderstood something.

For now, I will link the second fix as a Git Patch here : 
[^0001-fix-2-set-VolatileContentRepository-as-non-supportiv.patch], to help 
anyone in the need of a fix.

> VolatileContentRepository fails to retrieve content from claims with several 
> processors
> ---
>
> Key: NIFI-8760
> URL: https://issues.apache.org/jira/browse/NIFI-8760
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Core Framework
>Affects Versions: 1.13.1, 1.13.2
>Reporter: Matthieu RÉ
>Priority: Major
>  Labels: content-repository, volatile
> Attachments: 
> 0001-fix-2-set-VolatileContentRepository-as-non-supportiv.patch, flow.xml.gz, 
> nifi.properties
>
>
> For several processors such as MergeRecord, QueryRecord, SplitJson, the use 
> of VolatileContentRepository implementation infers errors while retrieving 
> Flowfiles from claims. The following logs are generated using NiFi 1.13.1 
> from Docker and the flow.xml.gz and nifi.properties file attached.
> MergeRecord (with JsonTreeReader, JsonRecordSetWriter with default 
> configuration):
> {{2021-07-06 10:15:09,170 ERROR [Timer-Driven Process Thread-1] 
> o.a.nifi.processors.standard.MergeRecord 
> MergeRecord[id=7b425cff-017a-1000-6a20-58c4e064df3d] Failed to bin 
> StandardFlowFileRecord[uuid=3e894a96-883a-4ac2-8121-b8200964cf20,claim=StandardContentClaim
>  [resourceClaim=StandardResourceClaim[id=6, container=in-memory, 
> section=section], offset=0, 
> length=5655],offset=0,name=b2c7cf61-b421-477d-902e-daeb2ed58f0d,size=5655] 
> due to org.apache.nifi.controller.repository.ContentNotFoundException: Could 
> not find content for StandardContentClaim 
> [resourceClaim=StandardResourceClaim[id=6, container=in-memory, 
> section=section], offset=0, length=-1]: 
> org.apache.nifi.controller.repository.ContentNotFoundException: Could not 
> find content for StandardContentClaim 
> [resourceClaim=StandardResourceClaim[id=6, container=in-memory, 
> section=section], offset=0, length=-1]}}
>  {{org.apache.nifi.controller.repository.ContentNotFoundException: Could not 
> find content for StandardContentClaim 
> [resourceClaim=StandardResourceClaim[id=6, container=in-memory, 
> section=section], offset=0, length=-1]}}
>  {{at 
> org.apache.nifi.controller.repository.VolatileContentRepository.getContent(VolatileContentRepository.java:445)}}
>  {{at 
> org.apache.nifi.controller.repository.VolatileContentRepository.read(VolatileContentRepository.java:468)}}
>  {{at 
> org.apache.nifi.controller.repository.VolatileContentRepository.read(VolatileContentRepository.java:473)}}
>  {{at 
> org.apache.nifi.controller.repository.StandardProcessSession.getInputStream(StandardProcessSession.java:2302)}}
>  {{at 
>