[GitHub] [airflow] albertusk95 edited a comment on issue #7075: [AIRFLOW-6212] SparkSubmitHook resolve connection

2020-01-08 Thread GitBox
albertusk95 edited a comment on issue #7075: [AIRFLOW-6212] SparkSubmitHook 
resolve connection
URL: https://github.com/apache/airflow/pull/7075#issuecomment-571989590
 
 
   > > > > @tooptoop4 if we don't remove the spark check on line 177, how to 
use this hook to track driver status deployed on yarn, mesos, or k8s? Since I 
think `spark://` is only for standalone mode.
   > > > > Or this hook is created only for standalone mode?
   > > > 
   > > > 
   > > > yes.there is no concept of async driver status poll for other modes , 
read https://spark.apache.org/docs/latest/running-on-yarn.html ! in other modes 
the submit to launch is synchronous . i think u can cancel this @albertusk95
   > > 
   > > 
   > > I couldn't find any info stating that there's no async driver polling 
for YARN anyway from the provided link.
   > 
   > There isn't async driver polling in YARN, I know Spark on YARN.
   
   How about using Livy to interact with the YARN cluster? I guess it supports 
sync & async results retrieval.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] albertusk95 edited a comment on issue #7075: [AIRFLOW-6212] SparkSubmitHook resolve connection

2020-01-07 Thread GitBox
albertusk95 edited a comment on issue #7075: [AIRFLOW-6212] SparkSubmitHook 
resolve connection
URL: https://github.com/apache/airflow/pull/7075#issuecomment-571876662
 
 
   > existing tests for connection added via db/cli needs to work
   
   well, I guess the current tests don't support connection added via cli, 
right?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] albertusk95 edited a comment on issue #7075: [AIRFLOW-6212] SparkSubmitHook resolve connection

2020-01-06 Thread GitBox
albertusk95 edited a comment on issue #7075: [AIRFLOW-6212] SparkSubmitHook 
resolve connection
URL: https://github.com/apache/airflow/pull/7075#issuecomment-571424197
 
 
   @tooptoop4 if we don't remove the spark check on line 177, how to use this 
hook to track driver status deployed on yarn, mesos, or k8s? Since I think 
`spark://` is only for standalone mode.
   
   Or this hook is created only for standalone mode?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] albertusk95 edited a comment on issue #7075: [AIRFLOW-6212] SparkSubmitHook resolve connection

2020-01-06 Thread GitBox
albertusk95 edited a comment on issue #7075: [AIRFLOW-6212] SparkSubmitHook 
resolve connection
URL: https://github.com/apache/airflow/pull/7075#issuecomment-571429121
 
 
   @tooptoop4 the tests pass only for the case when the connection information 
(host, port, conn_type, etc.) are stored in **database**. I tried this hook by 
storing the connection info as an **environment variable**.
   
   This failed because the URI parser returned irrelevant results for all types 
of cluster mode deployment. For instance, `URI=spark://host:port` will be 
parsed into `host:port` without the `spark://`. Obviously it returns this 
exception:
   
   ```
   airflow.exceptions.AirflowException: Cannot execute: [path/to/spark-submit, 
'--master', host:port, job_file.py]
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] albertusk95 edited a comment on issue #7075: [AIRFLOW-6212] SparkSubmitHook resolve connection

2020-01-06 Thread GitBox
albertusk95 edited a comment on issue #7075: [AIRFLOW-6212] SparkSubmitHook 
resolve connection
URL: https://github.com/apache/airflow/pull/7075#issuecomment-571429121
 
 
   @tooptoop4 the tests pass only for the case when the connection information 
(host, port, conn_type, etc.) are stored in **database**. I tried this hook by 
storing the connection info as an **environment variable**.
   
   This failed because the URI parser returned irrelevant results for all types 
of cluster mode deployment. For instance, `URI=spark://master-address:port` 
will be parsed into `master-address:port` without the `spark://`. Obviously it 
returns this exception:
   
   ```
   airflow.exceptions.AirflowException: Cannot execute: [path/to/spark-submit, 
'--master', host:port, job_file.py]
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] albertusk95 edited a comment on issue #7075: [AIRFLOW-6212] SparkSubmitHook resolve connection

2020-01-06 Thread GitBox
albertusk95 edited a comment on issue #7075: [AIRFLOW-6212] SparkSubmitHook 
resolve connection
URL: https://github.com/apache/airflow/pull/7075#issuecomment-571429121
 
 
   @tooptoop4 the tests pass only for the case when the connection information 
(host, port, conn_type, etc.) are stored in **database**. I tried this hook by 
storing the connection info as an **environment variable**. This failed because 
the URI parser returned irrelevant results for all types of cluster mode 
deployment (yarn, standalone, mesos, k8s)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] albertusk95 edited a comment on issue #7075: [AIRFLOW-6212] SparkSubmitHook resolve connection

2020-01-06 Thread GitBox
albertusk95 edited a comment on issue #7075: [AIRFLOW-6212] SparkSubmitHook 
resolve connection
URL: https://github.com/apache/airflow/pull/7075#issuecomment-571424197
 
 
   @tooptoop4 if we don't remove the spark check on line 177, how to use this 
hook to track driver status deployed on yarn, mesos, or k8s? Since I think 
`spark://` is only for standalone mode.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services