Thanks Bolke. That's awesome.
1)
So each task would creates its own spark session?
Is there is a way to have spark session sharing like discussed in this
email chain?
2)
Looks like SparkSqlHook calls `spark-sql` shell with all those parameters?
A spark operator exists as of 1.8.0 (which will be released tomorrow), you
might want to take a look at that. I know that an update is coming to that
operator that improves communication with Yarn.
Bolke
> On 18 Mar 2017, at 18:43, Russell Jurney wrote:
>
> Ruslan,
Ruslan, thanks for your feedback.
You mean the spark-submit context? Or like the SparkContext and
SparkSession? I don't think we could keep that alive, because it wouldn't
work out with multiple calls to spark-submit. I do feel your pain, though.
Maybe someone else can see how this might be done?
+1 Great idea.
my two cents - it would be nice (as an option) if SparkOperator would be
able to keep context open between different calls,
as it takes 30+ seconds to create a new context (on our cluster). Not sure
how well it fits Airflow architecture.
--
Ruslan Dautkhanov
On Sat, Mar 18,