Hi Koji, that seems a pretty good idea, thanks for bringing it up! I wasn't aware of nifi nano but definitely will give it a shot. =)
Thanks Em qua, 10 de abr de 2019 às 22:38, Koji Kawamura <[email protected]> escreveu: > Hi Eric, > > Although my knowledge on MiNiFi, Python and Go is limited, I wonder if > "nanofi" library can be used from the proprietary application so that > they can fetch FlowFiles directly using Site-to-Site protocol. That > can be an interesting approach and will be able to eliminate the need > of storing data to a local volume (mentioned in the possible approach > A). > https://github.com/apache/nifi-minifi-cpp/tree/master/nanofi > > The latest MiNiFi (C++) version 0.6.0 was released recently. > https://cwiki.apache.org/confluence/display/MINIFI/Release+Notes > > Thanks, > Koji > > On Thu, Apr 11, 2019 at 2:28 AM Eric Chaves <[email protected]> wrote: > > > > Hi Folks, > > > > My company is using nifi to perform several data-flow process and now we > received a requirement to do some fairly complex ETL over large files. To > process those files we have some proprietary applications (mostly written > in phyton or go) that ran as docker containers. > > > > I don't think that porting those apps as nifi processors would produce a > good result due to each app complexity. > > > > Also we would like keep using the nifi queues so we can monitor overall > progress as we already do (we ran several other nifi flows) so we are > discarding for now solutions that for example submit files to an external > queue like SQS or Rabbit for consumption. > > > > So far we come up with two solutions that would: > > > > have kubernete cluster of running jobs periodically querying the nifi > queue for new flowfiles and pull one when a file arrives. > > download the file-content (which is already stored outside of nifi) and > process it. > > submit the result back to nifi (using a HTTP Listener processor) to > trigger subsequent nifi process. > > > > > > For step 1 and 2 so far we are considering two possible approaches: > > > > A) use a minifi container togheter with the app container in a sidecar > design. minifi would connect to our nifi cluster and handle file download > to a local volume for the app container process them. > > > > B) use nifi rest API to query and consume flowfiles on queue > > > > One requirement is that if needed we would manually scale up the app > cluster to have multiple containers consumer more queued files in parallel. > > > > Do you guys recommend one over another (or a third approach)? Any > pitfalls you can foresee? > > > > Would be really glad to hear your thoughts on this matter. > > > > Best regards, > > > > Eric >
