Hi PasLe, we do that just fine with NiFi.
As for your first question, check my blog post https://boristyukin.com/how-to-connect-apache-nifi-to-apache-impala/ - you can connect to Impala and use PutSQL and ExecuteSQL processors to execute Impala SQL. You can also use putHDFS processor or create HFDS files outside of NiFi (we use sqoop for that and I blogged about it as well). To execute spark jobs, you have a few choices - the easiest is to run your spark app using spark-submit cli (we opted to do just that). Or you can use ExecuteSparkInteractive processor via Apache Livy - which is a bit more involving but then easier to use. Hope and it helps Boris On Sun, Jan 6, 2019 at 8:27 AM PasLe Choix <paslecho...@gmail.com> wrote: > Hello and happy new year, > > I am seeking the feasibility to migrate our current pipeline to NiFi, here > are key things I want to know: > > 1. How do I create external impala table based on parquet location? > 2. How do I create spark job which essentially contains large and complex > sql scripts to ETL our data > > Thank you very much. > > PC >