Charlie, "1. Would it be more efficient to let the files queue up, or to try to match the process rate with timing or back pressure?"
From an efficiency perspective each transaction has a cost. What you have to consider is whether you seek lower latency or higher throughput. The default settings of NiFi generally favor lower latency. You can on many processors slide the 'run duration' slider on the scheduling tab to the right a bit and the framework can then batch multiple transactions into one over say a 25 millisecond period. "2. I've made the suggested system edits in the administrator's guide, as well as increasing xms and xmx somewhat. Any additional suggestions?" If the flowfile, content, provenance, logs, OS are all on the same physical partition performance will be impacted. If you're trying achieve higher events/second or higher MB/s of data into and out of the system as a whole this is a great place to start. On rather modest enterprise servers you should expect tens if not hundreds of thousands of events per second and more than 100 MB/s of throughput. "3. Somewhat related, I think that logging / provenance is eating up disk space. My install directory stays large, even after processing has finished, until I stop and start NiFi which significantly reduces the disk in use." The default provenance and logging settings will eat up disk space and disk utilization. You can turn logging down to lower threshold to get less logged and you can flip provenance into an in-memory mode if you need or want to. "4. Maybe unrelated, what does the number that appears in a little white box at the top right of a processor indicate? It seems to show up on processors that have a large queue in front of them." That tells you how many threads the framework has outstanding for that process at that time. It makes sense that number is often associated with processors that have a queue. Key things to understand is what aspects of resource utilization are high: - CPU? If so how are you measuring that? - Network? Do you have a 1GB NIC and are seeing less than 120 MB/s throughput? - Memory? How large is the system memory and how large of a heap have you allocated to Java? - Disk? How many physical disks do you have and what is the utilization while running? How are you measuring this? Question - what is the performance you are wanting and how far off is that from what you're seeing? - do you have a flow template you can share so we could help identify any potential problems? Thanks Joe On Sun, Dec 6, 2015 at 1:23 AM, Charlie Frasure <[email protected]> wrote: > I sent this from the wrong email account a few days ago, but am still > interested in any thoughts. > > Got an interesting message that prompted me to follow up on a few > questions I have. The message is the screenshot below (if it works). It > says: "WARNING The rate of the dataflow is exceeding the provenance > recording rate. Slowing down flow to accommodate." > > [image: Inline image 1] > > This flow has queued up a few hundred thousand files (mostly very small) > and I'm not sure that's ideal. I read that there is some automatic > swapping that takes place at 20k file queues. It does eventually process > the files, but I would like to make sure we're taking advantage of any > performance options. > > 1. Would it be more efficient to let the files queue up, or to try to > match the process rate with timing or back pressure? > > 2. I've made the suggested system edits in the administrator's guide, as > well as increasing xms and xmx somewhat. Any additional suggestions? > > 3. Somewhat related, I think that logging / provenance is eating up disk > space. My install directory stays large, even after processing has > finished, until I stop and start NiFi which significantly reduces the disk > in use. > > 4. Maybe unrelated, what does the number that appears in a little white > box at the top right of a processor indicate? It seems to show up on > processors that have a large queue in front of them. > > [image: Inline image 2] > > Thanks, > Charlie > > >
