Hey James, Sorry, no one responded when you first sent the message but I'm curious what you ended up doing and any findings you had. Also, wanted to bring this thread back up to the attention of the larger group as it brings up some interesting questions I haven't found discussed elsewhere.
On the topic of the re-sorting of the queue, I was curious about the answer, so I dug down to the StandardFlowFileQueue and found that it's primarily just wrapping an instance of Java's PriorityQueue for its active queue[1]. This means that sorting is done each time a FlowFile is enqueued but also that we have immediate access to the head of the queue. I'm sure someone else (Mark Payne?) could explain better how we make use of the nuances of the queue for better performance and the impacts the different queue prioritizers have. For the higher priority FlowFiles starving out lower priority ones, I'm thinking about a way to give a weight instead of a priority. So in essence, a "weighted funnel processor", which grabs X Flowfiles each time but has a weighting assigned to different categories such that you take a certain number of each category based on a given weight. That said, I'm not sure that would be guaranteed to work when FlowFiles in the queue are swapped out since even if we iterated over everything in the incoming connection, there are still others swapped to disk. Also, there's probably performance concerns if we tried to implement it using the current tools offered to a processor. For the separate NiFis approach, I'm curious what other's view is. Personally, it makes sense to me, that for flows that are dramatically different in priority you'd want to section it off to another instance of NiFi. Essentially the separation between data-plane and control-plane instances of NiFi. Lastly, James, I assume you're limited to using the 0.7.x release for a specific reason? I'd highly suggest upgrading to the latest version whenever possible. There are many security and performance improvements, and of course many new features. [1] https://github.com/apache/nifi/blob/7f4cfd51ea07ead6c9b71b6c6d6f87a352b801d3/nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-framework-core/src/main/java/org/apache/nifi/controller/StandardFlowFileQueue.java#L89 Joe On Thu, Oct 19, 2017 at 8:58 AM, James McMahon <[email protected]> wrote: > Our team is considering ways to accelerate delivery for certain subsets of > content we process through NiFi. We are using Apache NiFi 0.7.x as our > baseline. > > This link discusses a recommended approach to content prioritization using > PriorityAttributePrioritizer on a connector (queue) to tailor throughput > based on a priority attribute we set upstream in our flow: > > https://stackoverflow.com/questions/42528993/how-to- > specify-priority-attributes-for-individual-flowfiles > > How often does the connector queue have to re-sort contents in order to > enforce our priority attribute? Is it re-sorting *every *single time new > flowFiles hit the queue? Won't that markedly and negatively impact > performance? > > If our priority 1s are a huge volume of flowfiles that persists over time, > won't this approach cause our priority 2s, 3s, etc etc to languish in queue? > > The described approach seems to embed significant business logic in the > NiFi workflows. In an environment where priorities change often, would that > be considered a poor approach? Might it be better to enforce priority > processing at a higher architectural level - a lightweight NiFi server to > accelerate delivery of priority one content and email alerts, a priority > two suite of NiFi servers for standard flowfile volume, a priority three > suite of servers to handle long-term bulk processing, etc etc? > > Thanks in advance for your help. -Jim > -- *Joe Percivall* linkedin.com/in/Percivall e: [email protected]
