Hola,

Our team has worked with NiFi for over one year. Our scenario is dealing
with 3-5 billion data using NiFi, we found that
WriteAheadFlowFileRepository and FileSystemRepository cannot meet
command,so we put data which need to be consumed in tmpfs and choose
VolatileFlowFileRepository and VolatileContentRepository to reduce I/O
costs and avoid WAL, because in our scenario, the data can be thrown away
when backpressure occurs or NiFi restarted.

But, we find three problems working with VolatileFlowFileRepository and
VolatileContentRepository.
1. VolatileContentRepository
when maxSize = 100MB and blockSize = 2KB, there should be 51200 "slots". If
we write one kb by one kb, 102400 one kb should be written in, but when
writing 51201th one kb, "java.io.IOException: Content Repository is out of
space" occurs. Here's the Junit Test I write.

@Test
public void test() throws IOException {
    System.setProperty(NiFiProperties.PROPERTIES_FILE_PATH,
TestVolatileContentRepository.class.getResource("/conf/nifi.properties").getFile());
    final Map<String, String> addProps = new HashMap<>();
    addProps.put(VolatileContentRepository.BLOCK_SIZE_PROPERTY, "2 KB");
    final NiFiProperties nifiProps =
NiFiProperties.createBasicNiFiProperties(null, addProps);
    final VolatileContentRepository contentRepo = new
VolatileContentRepository(nifiProps);
    contentRepo.initialize(claimManager);
    // can write 100 * 1024 /1 = 102400, but after 51201, blocks exhausted
    for (int idx =0; idx < 51201; ++idx) {
        final ContentClaim claim = contentRepo.create(true);
        try (final OutputStream out = contentRepo.write(claim)){
            final byte[] oneK = new byte[1024];
            Arrays.fill(oneK, (byte) 55);

            out.write(oneK);
        }
    }
}

2. VolatileFlowFileRepository
When the backpressure occurs, FileSystemSwapManager will swap out FlowFiles
to disk whenever swapQueue size exceeds 10000,  there's no problem in
swap-out process BUT in swap-in process, VolatileFlowFileRepository does
not "acknowledge" the FlowFiles which has been swap out when
FileSystemSwapManager swaps in FlowFiles from the disk and logs the warning
information "Cannot swap in FlowFiles from location..." because the
implementation of "isValidSwapLocationSuffix" in VolatileFlowFileRepository
is always FALSE.
And the queue is still like FULL when checking the NiFi frontend, the
upstream processor is STUCKED, maybe FileSystemSwapManager "thinks" these
FlowFiles are still not consumed.

3. we found that NiFi cannot live more than a week even if we use
WriteAheadFlowFileRepository and FileSystemRepository. NiFi stucked,
 didn't process any data and there was no output in nifi-app.log. We
restart NiFi and it is back to normal, but we didn't know what happened.

Muchas Gracias

Reply via email to