Hi PasLe,

we do that just fine with NiFi.

As for your first question, check my blog post
https://boristyukin.com/how-to-connect-apache-nifi-to-apache-impala/ - you
can connect to Impala and use PutSQL and ExecuteSQL processors to execute
Impala SQL. You can also use putHDFS processor or create HFDS files outside
of NiFi (we use sqoop for that and I blogged about it as well).

To execute spark jobs, you have a few choices - the easiest is to run your
spark app using spark-submit cli (we opted to do just that). Or you can
use ExecuteSparkInteractive processor via Apache Livy - which is a bit more
involving but then easier to use.

Hope and it helps
Boris


On Sun, Jan 6, 2019 at 8:27 AM PasLe Choix <paslecho...@gmail.com> wrote:

> Hello and happy new year,
>
> I am seeking the feasibility to migrate our current pipeline to NiFi, here
> are key things I want to know:
>
> 1. How do I create external impala table based on parquet location?
> 2. How do I create spark job which essentially contains large and complex
> sql scripts to ETL our data
>
> Thank you very much.
>
> PC
>

Reply via email to