Re: SparkSubmitOperator with Azure HDInsight

2018-06-22 Thread Niels Zeilemaker
If you run in non-cluster mode, the driver logs would be written to stdout. So airflow would capture those. Op vr 22 jun. 2018 18:04 schreef Naik Kaxil : > Thanks @Neils and @Kyle. > > @Neils - I agree, I don't want to copy Hadoop configurations to my airflow > VM. In this (Using SSHOperator)

Re: SparkSubmitOperator with Azure HDInsight

2018-06-22 Thread Naik Kaxil
Thanks @Neils and @Kyle. @Neils - I agree, I don't want to copy Hadoop configurations to my airflow VM. In this (Using SSHOperator) case, Airflow would just be receiving std output right as opposed to driver logs? @Kyle - If you can, then it would definitely be useful to have LivyOperators to

Re: SparkSubmitOperator with Azure HDInsight

2018-06-22 Thread Niels Zeilemaker
Hi Kaxil, I would recommend using the SSHOperator to start the Spark Job on the master node of the HDInsight cluster. This avoids the problems associated with Livy, and doesn't require you to open ports/copy the hadoop configuration to your airflow machine. Niels 2018-06-22 14:17 GMT+02:00 Naik

Re: SparkSubmitOperator with Azure HDInsight

2018-06-22 Thread Kyle Hamlin
I haven’t used any Azure products but I did build a Livy hook and operator so I could submit concurrent spark jobs to EMR clusters. I was planning on contributing the code, but it’s kinda a pain haha. If your interested I can take another stab at getting the Livy hook and operator contributed. On

SparkSubmitOperator with Azure HDInsight

2018-06-22 Thread Naik Kaxil
Hi all, Has anyone used the SparkSubmitOperator to submit Spark jobs on Azure HDInsight cluster? Are you using Livy or spark-submit to run remote Spark jobs? Regards, Kaxil Kaxil Naik Data Reply 2nd Floor, Nova South 160 Victoria Street, Westminster London SW1E 5LB - UK phone: +44 (0)20 7730