Hi Dano,

Thanks for your recommendation. I'll surely keep that in mind.

>From your answer, I infer at least some of your data processing uses NiFi as 
>the choreographer. In my case, we use NiFi just to move data around, so it 
>performs a more limited role.

To give you some context: My goal is to recreate our on-prem Data Warehouse in 
the cloud, preferably using managed services. 

We're currently in the early stages of our migration, still deciding on how to 
make data from our systems of record accessible in Google Cloud. The sources 
include relational databases, file extracts, and REST APIs. I've decided to 
start with the batch-oriented stuff, but the ultimate goal is to do data 
streaming processing. 

Currently, I'm running NiFi on-prem to copy JSON and CSV files to GCS, and also 
publish data retrieved from databases to Cloud Pub/Sub topics. Cloud Functions 
then trigger the execution of Dataflow pipelines (sometimes controlled by 
Airflow) in response, and the resulting validated, enriched data are stored in 
BigQuery. 

My NiFi flows on-prem usually start with ListFile, FetchFile, 
QueryDatabaseTable, ConsumeKafka and end with a PutGCSObject or 
PublishGCPPubSub processor. (Before, the flows were doing a lot more, from 
format conversions to custom-made data processing, but I'm now trying to let 
most of the hard work for Dataflow.) I intend to keep NiFi performing a similar 
role after I move the cluster over to the cloud.

Suggestions are always welcome. I find it frustrating sometimes to try to 
acquire all the necessary knowledge by myself. It seems to be very tribal. 

Thanks again,

Marcio



On Sunday, July 28, 2019, 10:17:01 a.m. EDT, dan young <[email protected]> 
wrote: 

Hello Márcio,

We've been running NiFi clusters for almost 3 years now at Looker on AWS. We 
will be moving these over to GCP in the future. My main recommendation is to 
ensure that you're using something like Ansible to help with the deployment and 
configuration of the cluster. We use a lot of execute stream command processors 
to run a variety of node workloads. 
 
Other than that, a lot will be specific to your use case and mileage will 
vary.... 

Regards

Dano


On Fri, Jul 26, 2019, 10:27 PM Márcio Sugar <[email protected]> wrote:
> Hi,
> 
> Please, is there any tutorial, guide or set of best practices that help with 
> installing and using NiFi on Google Cloud (or any cloud provider, for that 
> matter)? 
> 
> Thank you,
> 
> Marcio
> 

Reply via email to