Hi Eric,

Although my knowledge on MiNiFi, Python and Go is limited, I wonder if
"nanofi" library can be used from the proprietary application so that
they can fetch FlowFiles directly using Site-to-Site protocol. That
can be an interesting approach and will be able to eliminate the need
of storing data to a local volume (mentioned in the possible approach
A).
https://github.com/apache/nifi-minifi-cpp/tree/master/nanofi

The latest MiNiFi (C++) version 0.6.0 was released recently.
https://cwiki.apache.org/confluence/display/MINIFI/Release+Notes

Thanks,
Koji

On Thu, Apr 11, 2019 at 2:28 AM Eric Chaves <[email protected]> wrote:
>
> Hi Folks,
>
> My company is using nifi to perform several data-flow process and now we 
> received a requirement to do some fairly complex ETL over large files. To 
> process those files we have some proprietary applications (mostly written in 
> phyton or go) that ran as docker containers.
>
> I don't think that porting those apps as nifi processors would produce a good 
> result due to each app complexity.
>
> Also we would like keep using the nifi queues so we can monitor overall 
> progress as we already do (we ran several other nifi flows) so we are 
> discarding for now solutions that for example submit files to an external 
> queue like SQS or Rabbit for consumption.
>
> So far we come up with two solutions that would:
>
> have kubernete cluster of running jobs periodically querying the nifi queue 
> for new flowfiles and pull one when a file arrives.
> download the file-content (which is already stored outside of nifi) and 
> process it.
> submit the result back to nifi (using a HTTP Listener processor) to trigger 
> subsequent nifi process.
>
>
> For step 1 and 2 so far we are considering two possible approaches:
>
> A) use a minifi container togheter with the app container in a sidecar 
> design. minifi would connect to our nifi cluster and handle file download to 
> a local volume for the app container process them.
>
> B) use nifi rest API to query and consume flowfiles on queue
>
> One requirement is that if needed we would manually scale up the app cluster 
> to have multiple containers consumer more queued files in parallel.
>
> Do you guys recommend one over another (or a third approach)? Any pitfalls 
> you can foresee?
>
> Would be really glad to hear your thoughts on this matter.
>
> Best regards,
>
> Eric

Reply via email to