[GitHub] [airflow] albertusk95 edited a comment on issue #7075: [AIRFLOW-6212] SparkSubmitHook resolve connection
albertusk95 edited a comment on issue #7075: [AIRFLOW-6212] SparkSubmitHook resolve connection URL: https://github.com/apache/airflow/pull/7075#issuecomment-571989590 > > > > @tooptoop4 if we don't remove the spark check on line 177, how to use this hook to track driver status deployed on yarn, mesos, or k8s? Since I think `spark://` is only for standalone mode. > > > > Or this hook is created only for standalone mode? > > > > > > > > > yes.there is no concept of async driver status poll for other modes , read https://spark.apache.org/docs/latest/running-on-yarn.html ! in other modes the submit to launch is synchronous . i think u can cancel this @albertusk95 > > > > > > I couldn't find any info stating that there's no async driver polling for YARN anyway from the provided link. > > There isn't async driver polling in YARN, I know Spark on YARN. How about using Livy to interact with the YARN cluster? I guess it supports sync & async results retrieval. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] albertusk95 edited a comment on issue #7075: [AIRFLOW-6212] SparkSubmitHook resolve connection
albertusk95 edited a comment on issue #7075: [AIRFLOW-6212] SparkSubmitHook resolve connection URL: https://github.com/apache/airflow/pull/7075#issuecomment-571876662 > existing tests for connection added via db/cli needs to work well, I guess the current tests don't support connection added via cli, right? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] albertusk95 edited a comment on issue #7075: [AIRFLOW-6212] SparkSubmitHook resolve connection
albertusk95 edited a comment on issue #7075: [AIRFLOW-6212] SparkSubmitHook resolve connection URL: https://github.com/apache/airflow/pull/7075#issuecomment-571424197 @tooptoop4 if we don't remove the spark check on line 177, how to use this hook to track driver status deployed on yarn, mesos, or k8s? Since I think `spark://` is only for standalone mode. Or this hook is created only for standalone mode? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] albertusk95 edited a comment on issue #7075: [AIRFLOW-6212] SparkSubmitHook resolve connection
albertusk95 edited a comment on issue #7075: [AIRFLOW-6212] SparkSubmitHook resolve connection URL: https://github.com/apache/airflow/pull/7075#issuecomment-571429121 @tooptoop4 the tests pass only for the case when the connection information (host, port, conn_type, etc.) are stored in **database**. I tried this hook by storing the connection info as an **environment variable**. This failed because the URI parser returned irrelevant results for all types of cluster mode deployment. For instance, `URI=spark://host:port` will be parsed into `host:port` without the `spark://`. Obviously it returns this exception: ``` airflow.exceptions.AirflowException: Cannot execute: [path/to/spark-submit, '--master', host:port, job_file.py] ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] albertusk95 edited a comment on issue #7075: [AIRFLOW-6212] SparkSubmitHook resolve connection
albertusk95 edited a comment on issue #7075: [AIRFLOW-6212] SparkSubmitHook resolve connection URL: https://github.com/apache/airflow/pull/7075#issuecomment-571429121 @tooptoop4 the tests pass only for the case when the connection information (host, port, conn_type, etc.) are stored in **database**. I tried this hook by storing the connection info as an **environment variable**. This failed because the URI parser returned irrelevant results for all types of cluster mode deployment. For instance, `URI=spark://master-address:port` will be parsed into `master-address:port` without the `spark://`. Obviously it returns this exception: ``` airflow.exceptions.AirflowException: Cannot execute: [path/to/spark-submit, '--master', host:port, job_file.py] ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] albertusk95 edited a comment on issue #7075: [AIRFLOW-6212] SparkSubmitHook resolve connection
albertusk95 edited a comment on issue #7075: [AIRFLOW-6212] SparkSubmitHook resolve connection URL: https://github.com/apache/airflow/pull/7075#issuecomment-571429121 @tooptoop4 the tests pass only for the case when the connection information (host, port, conn_type, etc.) are stored in **database**. I tried this hook by storing the connection info as an **environment variable**. This failed because the URI parser returned irrelevant results for all types of cluster mode deployment (yarn, standalone, mesos, k8s) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] albertusk95 edited a comment on issue #7075: [AIRFLOW-6212] SparkSubmitHook resolve connection
albertusk95 edited a comment on issue #7075: [AIRFLOW-6212] SparkSubmitHook resolve connection URL: https://github.com/apache/airflow/pull/7075#issuecomment-571424197 @tooptoop4 if we don't remove the spark check on line 177, how to use this hook to track driver status deployed on yarn, mesos, or k8s? Since I think `spark://` is only for standalone mode. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services