Trying to figure out why a `MergeContent` processor was producing a linearly rising amount of content which wasn't reaped correctly (the retention policies would not be upheld and disk space would fall to zero), we realized that some flow files in the queue pointed to content which didn't exist on disk. The file in the content repository was zero bytes.
How might this have happened and if it happens, shouldn't processors somehow be able to recover from it? What seems to happen is that the flow file goes right back into the queue where it will of course fail again. Further, a simple grep seems to show that references to the empty content file id appears in many other files in the content repository. This seems to suggest that all this content can't be reaped because there it is still being referenced somehow and thus isn't applicable for archival and/or deletion. Thanks for any ideas.