[jira] [Commented] (AIRFLOW-2140) Add Kubernetes Scheduler to Spark Submit Operator

2018-08-08 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572855#comment-16572855
 ] 

ASF subversion and git services commented on AIRFLOW-2140:
--

Commit f58246d2ef265eb762c179a12c40e011ce62cea1 in incubator-airflow's branch 
refs/heads/v1-10-test from [~ashb]
[ https://gitbox.apache.org/repos/asf?p=incubator-airflow.git;h=f58246d ]

[AIRFLOW-2140] Don't require kubernetes for the SparkSubmit hook (#3700)

This extra dep is a quasi-breaking change when upgrading - previously
there were no deps outside of Airflow itself for this hook. Importing
the k8s libs breaks installs that aren't also using Kubernetes.

This makes the dep optional for anyone who doesn't explicitly use the
functionality

(cherry picked from commit 0be002eebb182b607109a0390d7f6fb8795c668b)
Signed-off-by: Bolke de Bruin 


> Add Kubernetes Scheduler to Spark Submit Operator
> -
>
> Key: AIRFLOW-2140
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2140
> Project: Apache Airflow
>  Issue Type: New Feature
>Affects Versions: 1.9.0
>Reporter: Rob Keevil
>Assignee: Rob Keevil
>Priority: Major
> Fix For: 2.0.0
>
>
> Spark 2.3 adds the Kubernetes resource manager to Spark, alongside the 
> existing Standalone, Yarn and Mesos resource managers. 
> https://github.com/apache/spark/blob/master/docs/running-on-kubernetes.md
> We should extend the spark submit operator to enable the new K8s spark submit 
> options, and to be able to monitor Spark jobs running within Kubernetes.
> I already have working code for this, I need to test the monitoring/log 
> parsing code and make sure that Airflow is able to terminate Kubernetes pods 
> when jobs are cancelled etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2140) Add Kubernetes Scheduler to Spark Submit Operator

2018-08-07 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572143#comment-16572143
 ] 

ASF GitHub Bot commented on AIRFLOW-2140:
-

bolkedebruin closed pull request #3700: [AIRFLOW-2140] Don't require kubernetes 
for the SparkSubmit hook
URL: https://github.com/apache/incubator-airflow/pull/3700
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/contrib/hooks/spark_submit_hook.py 
b/airflow/contrib/hooks/spark_submit_hook.py
index 0185cab283..65bb6134e6 100644
--- a/airflow/contrib/hooks/spark_submit_hook.py
+++ b/airflow/contrib/hooks/spark_submit_hook.py
@@ -26,7 +26,6 @@
 from airflow.exceptions import AirflowException
 from airflow.utils.log.logging_mixin import LoggingMixin
 from airflow.contrib.kubernetes import kube_client
-from kubernetes.client.rest import ApiException
 
 
 class SparkSubmitHook(BaseHook, LoggingMixin):
@@ -136,6 +135,10 @@ def __init__(self,
 self._connection = self._resolve_connection()
 self._is_yarn = 'yarn' in self._connection['master']
 self._is_kubernetes = 'k8s' in self._connection['master']
+if self._is_kubernetes and kube_client is None:
+raise RuntimeError(
+"{master} specified by kubernetes dependencies are not 
installed!".format(
+self._connection['master']))
 
 self._should_track_driver_status = 
self._resolve_should_track_driver_status()
 self._driver_id = None
@@ -559,6 +562,6 @@ def on_kill(self):
 
 self.log.info("Spark on K8s killed with response: %s", 
api_response)
 
-except ApiException as e:
+except kube_client.ApiException as e:
 self.log.info("Exception when attempting to kill Spark on 
K8s:")
 self.log.exception(e)
diff --git a/airflow/contrib/kubernetes/kube_client.py 
b/airflow/contrib/kubernetes/kube_client.py
index 8b71f41242..4b8fa17155 100644
--- a/airflow/contrib/kubernetes/kube_client.py
+++ b/airflow/contrib/kubernetes/kube_client.py
@@ -17,9 +17,21 @@
 from airflow.configuration import conf
 from six import PY2
 
+try:
+from kubernetes import config, client
+from kubernetes.client.rest import ApiException
+has_kubernetes = True
+except ImportError as e:
+# We need an exception class to be able to use it in ``except`` elsewhere
+# in the code base
+ApiException = BaseException
+has_kubernetes = False
+_import_err = e
+
 
 def _load_kube_config(in_cluster, cluster_context, config_file):
-from kubernetes import config, client
+if not has_kubernetes:
+raise _import_err
 if in_cluster:
 config.load_incluster_config()
 else:


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add Kubernetes Scheduler to Spark Submit Operator
> -
>
> Key: AIRFLOW-2140
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2140
> Project: Apache Airflow
>  Issue Type: New Feature
>Affects Versions: 1.9.0
>Reporter: Rob Keevil
>Assignee: Rob Keevil
>Priority: Major
> Fix For: 2.0.0
>
>
> Spark 2.3 adds the Kubernetes resource manager to Spark, alongside the 
> existing Standalone, Yarn and Mesos resource managers. 
> https://github.com/apache/spark/blob/master/docs/running-on-kubernetes.md
> We should extend the spark submit operator to enable the new K8s spark submit 
> options, and to be able to monitor Spark jobs running within Kubernetes.
> I already have working code for this, I need to test the monitoring/log 
> parsing code and make sure that Airflow is able to terminate Kubernetes pods 
> when jobs are cancelled etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2140) Add Kubernetes Scheduler to Spark Submit Operator

2018-08-07 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572144#comment-16572144
 ] 

ASF subversion and git services commented on AIRFLOW-2140:
--

Commit 0be002eebb182b607109a0390d7f6fb8795c668b in incubator-airflow's branch 
refs/heads/master from [~ashb]
[ https://gitbox.apache.org/repos/asf?p=incubator-airflow.git;h=0be002e ]

[AIRFLOW-2140] Don't require kubernetes for the SparkSubmit hook (#3700)

This extra dep is a quasi-breaking change when upgrading - previously
there were no deps outside of Airflow itself for this hook. Importing
the k8s libs breaks installs that aren't also using Kubernetes.

This makes the dep optional for anyone who doesn't explicitly use the
functionality

> Add Kubernetes Scheduler to Spark Submit Operator
> -
>
> Key: AIRFLOW-2140
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2140
> Project: Apache Airflow
>  Issue Type: New Feature
>Affects Versions: 1.9.0
>Reporter: Rob Keevil
>Assignee: Rob Keevil
>Priority: Major
> Fix For: 2.0.0
>
>
> Spark 2.3 adds the Kubernetes resource manager to Spark, alongside the 
> existing Standalone, Yarn and Mesos resource managers. 
> https://github.com/apache/spark/blob/master/docs/running-on-kubernetes.md
> We should extend the spark submit operator to enable the new K8s spark submit 
> options, and to be able to monitor Spark jobs running within Kubernetes.
> I already have working code for this, I need to test the monitoring/log 
> parsing code and make sure that Airflow is able to terminate Kubernetes pods 
> when jobs are cancelled etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2140) Add Kubernetes Scheduler to Spark Submit Operator

2018-03-12 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-2140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16395080#comment-16395080
 ] 

ASF subversion and git services commented on AIRFLOW-2140:
--

Commit 64100d2a2e0b9dafbd2f0b355d414781d98f41c9 in incubator-airflow's branch 
refs/heads/master from [~RJKeevil]
[ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=64100d2 ]

[AIRFLOW-2140] Add Kubernetes scheduler to SparkSubmitOperator

Closes #3112 from RJKeevil/spark-k8s


> Add Kubernetes Scheduler to Spark Submit Operator
> -
>
> Key: AIRFLOW-2140
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2140
> Project: Apache Airflow
>  Issue Type: New Feature
>Affects Versions: 1.9.0
>Reporter: Rob Keevil
>Assignee: Rob Keevil
>Priority: Major
> Fix For: 2.0.0
>
>
> Spark 2.3 adds the Kubernetes resource manager to Spark, alongside the 
> existing Standalone, Yarn and Mesos resource managers. 
> https://github.com/apache/spark/blob/master/docs/running-on-kubernetes.md
> We should extend the spark submit operator to enable the new K8s spark submit 
> options, and to be able to monitor Spark jobs running within Kubernetes.
> I already have working code for this, I need to test the monitoring/log 
> parsing code and make sure that Airflow is able to terminate Kubernetes pods 
> when jobs are cancelled etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2140) Add Kubernetes Scheduler to Spark Submit Operator

2018-03-07 Thread Rob Keevil (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-2140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389598#comment-16389598
 ] 

Rob Keevil commented on AIRFLOW-2140:
-

on_kill code is written, but is currently never called by Airflow.  This will 
need to be retested when AIRFLOW-1623 has been fixed.

I also set the spark-submit log level to info instead of debug, as this is very 
important info to see in the logs (i.e. did the submit fail).  Perhaps this 
will be overly verbose in some environments. 

> Add Kubernetes Scheduler to Spark Submit Operator
> -
>
> Key: AIRFLOW-2140
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2140
> Project: Apache Airflow
>  Issue Type: New Feature
>Affects Versions: 1.9.0
>Reporter: Rob Keevil
>Assignee: Rob Keevil
>Priority: Major
>
> Spark 2.3 adds the Kubernetes resource manager to Spark, alongside the 
> existing Standalone, Yarn and Mesos resource managers. 
> https://github.com/apache/spark/blob/master/docs/running-on-kubernetes.md
> We should extend the spark submit operator to enable the new K8s spark submit 
> options, and to be able to monitor Spark jobs running within Kubernetes.
> I already have working code for this, I need to test the monitoring/log 
> parsing code and make sure that Airflow is able to terminate Kubernetes pods 
> when jobs are cancelled etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)