[jira] [Commented] (SPARK-39546) Respect port defininitions on K8S pod templates for both driver and executor

2022-09-06 Thread Oliver Koeth (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17600669#comment-17600669
 ] 

Oliver Koeth commented on SPARK-39546:
--

Yes, looks like this should do it. Thank you

> Respect port defininitions on K8S pod templates for both driver and executor
> 
>
> Key: SPARK-39546
> URL: https://issues.apache.org/jira/browse/SPARK-39546
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.3.0
>Reporter: Oliver Koeth
>Priority: Minor
>
> *Description:*
> Spark on K8S allows to open additional ports for custom purposes on the 
> driver pod via the pod template, but ignores the port specification in the 
> executor pod template. Port specifications from the pod template should be 
> preserved (and extended) for both drivers and executors.
> *Scenario:*
> I want to run functionality in the executor that exposes data on an 
> additional port. In my case, this is monitoring data exposed by Spark's JMX 
> metrics sink via the JMX prometheus exporter java agent 
> https://github.com/prometheus/jmx_exporter -- the java agent opens an extra 
> port inside the container, but for prometheus to detect and scrape the port, 
> it must be exposed in the K8S pod resource.
> (More background if desired: This seems to be the "classic" Spark 2 way to 
> expose prometheus metrics. Spark 3 introduced a native equivalent servlet for 
> the driver, but for the executor, only a rather limited set of metrics is 
> forwarded via the driver, and that also follows a completely different naming 
> scheme. So the JMX + exporter approach still turns out to be more useful for 
> me, even in Spark 3)
> Expected behavior:
> I add the following to my pod template to expose the extra port opened by the 
> JMX exporter java agent
> spec:
>   containers:
>   - ...
>     ports:
>     - containerPort: 8090
>       name: jmx-prometheus
>       protocol: TCP
> Observed behavior:
> The port is exposed for driver pods but not for executor pods
> *Corresponding code:*
> driver pod creation just adds ports
> [https://github.com/apache/spark/blob/master/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicDriverFeatureStep.scala]
>  (currently line 115)
> val driverContainer = new ContainerBuilder(pod.container)
> ...
>   .addNewPort()
> ...
>   .addNewPort()
> while executor pod creation replaces the ports
> [https://github.com/apache/spark/blob/master/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicExecutorFeatureStep.scala]
>  (currently line 211)
> val executorContainer = new ContainerBuilder(pod.container)
> ...
>   .withPorts(requiredPorts.asJava)
> The current handling is incosistent and unnecessarily limiting. It seems that 
> the executor creation could/should just as well preserve pods from the 
> template and add extra required ports.
> *Workaround:*
> It is possible to work around this limitation by adding a full sidecar 
> container to the executor pod spec which declares the port. Sidecar 
> containers are left unchanged by pod template handling.
> As all containers in a pod share the same network, it does not matter which 
> container actually declares to expose the port.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-39546) Respect port defininitions on K8S pod templates for both driver and executor

2022-06-21 Thread Oliver Koeth (Jira)
Oliver Koeth created SPARK-39546:


 Summary: Respect port defininitions on K8S pod templates for both 
driver and executor
 Key: SPARK-39546
 URL: https://issues.apache.org/jira/browse/SPARK-39546
 Project: Spark
  Issue Type: Improvement
  Components: Kubernetes
Affects Versions: 3.3.0
Reporter: Oliver Koeth


*Description:*

Spark on K8S allows to open additional ports for custom purposes on the driver 
pod via the pod template, but ignores the port specification in the executor 
pod template. Port specifications from the pod template should be preserved 
(and extended) for both drivers and executors.

*Scenario:*

I want to run functionality in the executor that exposes data on an additional 
port. In my case, this is monitoring data exposed by Spark's JMX metrics sink 
via the JMX prometheus exporter java agent 
https://github.com/prometheus/jmx_exporter -- the java agent opens an extra 
port inside the container, but for prometheus to detect and scrape the port, it 
must be exposed in the K8S pod resource.
(More background if desired: This seems to be the "classic" Spark 2 way to 
expose prometheus metrics. Spark 3 introduced a native equivalent servlet for 
the driver, but for the executor, only a rather limited set of metrics is 
forwarded via the driver, and that also follows a completely different naming 
scheme. So the JMX + exporter approach still turns out to be more useful for 
me, even in Spark 3)

Expected behavior:

I add the following to my pod template to expose the extra port opened by the 
JMX exporter java agent

spec:
  containers:
  - ...
    ports:
    - containerPort: 8090
      name: jmx-prometheus
      protocol: TCP

Observed behavior:

The port is exposed for driver pods but not for executor pods


*Corresponding code:*

driver pod creation just adds ports
[https://github.com/apache/spark/blob/master/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicDriverFeatureStep.scala]
 (currently line 115)

val driverContainer = new ContainerBuilder(pod.container)
...
  .addNewPort()
...
  .addNewPort()

while executor pod creation replaces the ports
[https://github.com/apache/spark/blob/master/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicExecutorFeatureStep.scala]
 (currently line 211)

val executorContainer = new ContainerBuilder(pod.container)
...
  .withPorts(requiredPorts.asJava)


The current handling is incosistent and unnecessarily limiting. It seems that 
the executor creation could/should just as well preserve pods from the template 
and add extra required ports.


*Workaround:*

It is possible to work around this limitation by adding a full sidecar 
container to the executor pod spec which declares the port. Sidecar containers 
are left unchanged by pod template handling.
As all containers in a pod share the same network, it does not matter which 
container actually declares to expose the port.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-20044) Support Spark UI behind front-end reverse proxy using a path prefix

2017-03-28 Thread Oliver Koeth (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15944829#comment-15944829
 ] 

Oliver Koeth edited comment on SPARK-20044 at 3/28/17 9:09 AM:
---

The field "webuiaddress" in the /json response does not reflect the 
proxy, but returns the internal worker host/port. I left this unchanged form 
SPARK-15487.
The underlying field WorkerInfo.webUiAddress does not reflect proxy on purpose, 
because this is what the master uses to build the proxy routing 
{noformat}
if (reverseProxy) {
   webUi.addProxyTargets(worker.id, worker.webUiAddress)
}
{noformat}
If the returned JSON should rather show the "external" proxy address, one could 
either fix up the JSON (without affecting workerInfo) in MasterPage.renderJson 
or add a new property "externalwebuiaddress" to WorkerInfo (normally equal to 
webuiaddress, except in case of reverse proxy)


was (Author: okoethibm):
The field "webuiaddress" in the /json response does not reflect the 
proxy, but returns the internal worker host/port. I left this unchanged form 
SPARK-15487.
The underlying field WorkerInfo.webUiAddress does not reflect proxy on purpose, 
because this is what the master uses to build the proxy rounting
if (reverseProxy) {
   webUi.addProxyTargets(worker.id, worker.webUiAddress)
}

If the returned JSON should rather show the "external" proxy address, one could 
either fix up the JSON (without affecting workerInfo) in MasterPage.renderJson 
or add a new property "externalwebuiaddress" to WorkerInfo (normally equal to 
webuiaddress, except in case of reverse proxy)

> Support Spark UI behind front-end reverse proxy using a path prefix
> ---
>
> Key: SPARK-20044
> URL: https://issues.apache.org/jira/browse/SPARK-20044
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 2.1.0
>Reporter: Oliver Koeth
>Priority: Minor
>  Labels: reverse-proxy, sso
>
> Purpose: allow to run the Spark web UI behind a reverse proxy with URLs 
> prefixed by a context root, like www.mydomain.com/spark. In particular, this 
> allows to access multiple Spark clusters through the same virtual host, only 
> distinguishing them by context root, like www.mydomain.com/cluster1, 
> www.mydomain.com/cluster2, and it allows to run the Spark UI in a common 
> cookie domain (for SSO) with other services.
> [SPARK-15487] introduced some support for front-end reverse proxies by 
> allowing all Spark UI requests to be routed through the master UI as a single 
> endpoint and also added a spark.ui.reverseProxyUrl setting to define a 
> another proxy sitting in front of Spark. However, as noted in the comments on 
> [SPARK-15487], this mechanism does not currently work if the reverseProxyUrl 
> includes a context root like the examples above: Most links generated by the 
> Spark UI result in full path URLs (like /proxy/app-"id"/...) that do not 
> account for a path prefix (context root) and work only if the Spark UI "owns" 
> the entire virtual host. In fact, the only place in the UI where the 
> reverseProxyUrl seems to be used is the back-link from the worker UI to the 
> master UI.
> The discussion on [SPARK-15487] proposes to open a new issue for the problem, 
> but that does not seem to have happened, so this issue aims to address the 
> remaining shortcomings of spark.ui.reverseProxyUrl
> The problem can be partially worked around by doing content rewrite in a 
> front-end proxy and prefixing src="/..." or href="/..." links with a context 
> root. However, detecting and patching URLs in HTML output is not a robust 
> approach and breaks down for URLs included in custom REST responses. E.g. the 
> "allexecutors" REST call used from the Spark 2.1.0 application/executors page 
> returns links for log viewing that direct to the worker UI and do not work in 
> this scenario.
> This issue proposes to honor spark.ui.reverseProxyUrl throughout Spark UI URL 
> generation. Experiments indicate that most of this can simply be achieved by 
> using/prepending spark.ui.reverseProxyUrl to the existing spark.ui.proxyBase 
> system property. Beyond that, the places that require adaption are
> - worker and application links in the master web UI
> - webui URLs returned by REST interfaces
> Note: It seems that returned redirect location headers do not need to be 
> adapted, since URL rewriting for these is commonly done in front-end proxies 
> and has a well-defined interface



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20044) Support Spark UI behind front-end reverse proxy using a path prefix

2017-03-28 Thread Oliver Koeth (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15944829#comment-15944829
 ] 

Oliver Koeth commented on SPARK-20044:
--

The field "webuiaddress" in the /json response does not reflect the 
proxy, but returns the internal worker host/port. I left this unchanged form 
SPARK-15487.
The underlying field WorkerInfo.webUiAddress does not reflect proxy on purpose, 
because this is what the master uses to build the proxy rounting
if (reverseProxy) {
   webUi.addProxyTargets(worker.id, worker.webUiAddress)
}

If the returned JSON should rather show the "external" proxy address, one could 
either fix up the JSON (without affecting workerInfo) in MasterPage.renderJson 
or add a new property "externalwebuiaddress" to WorkerInfo (normally equal to 
webuiaddress, except in case of reverse proxy)

> Support Spark UI behind front-end reverse proxy using a path prefix
> ---
>
> Key: SPARK-20044
> URL: https://issues.apache.org/jira/browse/SPARK-20044
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 2.1.0
>Reporter: Oliver Koeth
>Priority: Minor
>  Labels: reverse-proxy, sso
>
> Purpose: allow to run the Spark web UI behind a reverse proxy with URLs 
> prefixed by a context root, like www.mydomain.com/spark. In particular, this 
> allows to access multiple Spark clusters through the same virtual host, only 
> distinguishing them by context root, like www.mydomain.com/cluster1, 
> www.mydomain.com/cluster2, and it allows to run the Spark UI in a common 
> cookie domain (for SSO) with other services.
> [SPARK-15487] introduced some support for front-end reverse proxies by 
> allowing all Spark UI requests to be routed through the master UI as a single 
> endpoint and also added a spark.ui.reverseProxyUrl setting to define a 
> another proxy sitting in front of Spark. However, as noted in the comments on 
> [SPARK-15487], this mechanism does not currently work if the reverseProxyUrl 
> includes a context root like the examples above: Most links generated by the 
> Spark UI result in full path URLs (like /proxy/app-"id"/...) that do not 
> account for a path prefix (context root) and work only if the Spark UI "owns" 
> the entire virtual host. In fact, the only place in the UI where the 
> reverseProxyUrl seems to be used is the back-link from the worker UI to the 
> master UI.
> The discussion on [SPARK-15487] proposes to open a new issue for the problem, 
> but that does not seem to have happened, so this issue aims to address the 
> remaining shortcomings of spark.ui.reverseProxyUrl
> The problem can be partially worked around by doing content rewrite in a 
> front-end proxy and prefixing src="/..." or href="/..." links with a context 
> root. However, detecting and patching URLs in HTML output is not a robust 
> approach and breaks down for URLs included in custom REST responses. E.g. the 
> "allexecutors" REST call used from the Spark 2.1.0 application/executors page 
> returns links for log viewing that direct to the worker UI and do not work in 
> this scenario.
> This issue proposes to honor spark.ui.reverseProxyUrl throughout Spark UI URL 
> generation. Experiments indicate that most of this can simply be achieved by 
> using/prepending spark.ui.reverseProxyUrl to the existing spark.ui.proxyBase 
> system property. Beyond that, the places that require adaption are
> - worker and application links in the master web UI
> - webui URLs returned by REST interfaces
> Note: It seems that returned redirect location headers do not need to be 
> adapted, since URL rewriting for these is commonly done in front-end proxies 
> and has a well-defined interface



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-20044) Support Spark UI behind front-end reverse proxy using a path prefix

2017-03-28 Thread Oliver Koeth (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15944778#comment-15944778
 ] 

Oliver Koeth edited comment on SPARK-20044 at 3/28/17 8:43 AM:
---

Added a pull request. Works for me with the following configuration:
spark.ui.reverseProxy=true
spark.ui.reverseProxyUrl=/path/to/spark/

nginx front-end proxy setup:
{noformat}
server {
listen 9000;
set $SPARK_MASTER http://:8080;

# redirect master UI path without terminating slash,
# so that relative URLs are resolved correctly
location ~ ^(?/path/to/spark$) {
return 302 $scheme://$host:$server_port$prefix/;
}

# split spark UI path into prefix and local path within master UI
location ~ ^(?/path/to/spark)(?/.*) {
# strip prefix when forwarding request
rewrite ^ $local_path break;
# forward to spark master UI
proxy_pass $SPARK_MASTER;
# fix host (implicit) and add prefix on redirects
proxy_redirect $SPARK_MASTER $prefix;
}
}
{noformat}


was (Author: okoethibm):
Added a pull request. Works for me with the following configuration:
spark.ui.reverseProxy=true
spark.ui.reverseProxyUrl=/path/to/spark/

nginx front-end proxy setup:

server {
listen 9000;
set $SPARK_MASTER http://:8080;

# redirect master UI path without terminating slash,
# so that relative URLs are resolved correctly
location ~ ^(?/path/to/spark$) {
return 302 $scheme://$host:$server_port$prefix/;
}

# split spark UI path into prefix and local path within master UI
location ~ ^(?/path/to/spark)(?/.*) {
# strip prefix when forwarding request
rewrite ^ $local_path break;
# forward to spark master UI
proxy_pass $SPARK_MASTER;
# fix host (implicit) and add prefix on redirects
proxy_redirect $SPARK_MASTER $prefix;
}
}


> Support Spark UI behind front-end reverse proxy using a path prefix
> ---
>
> Key: SPARK-20044
> URL: https://issues.apache.org/jira/browse/SPARK-20044
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 2.1.0
>Reporter: Oliver Koeth
>Priority: Minor
>  Labels: reverse-proxy, sso
>
> Purpose: allow to run the Spark web UI behind a reverse proxy with URLs 
> prefixed by a context root, like www.mydomain.com/spark. In particular, this 
> allows to access multiple Spark clusters through the same virtual host, only 
> distinguishing them by context root, like www.mydomain.com/cluster1, 
> www.mydomain.com/cluster2, and it allows to run the Spark UI in a common 
> cookie domain (for SSO) with other services.
> [SPARK-15487] introduced some support for front-end reverse proxies by 
> allowing all Spark UI requests to be routed through the master UI as a single 
> endpoint and also added a spark.ui.reverseProxyUrl setting to define a 
> another proxy sitting in front of Spark. However, as noted in the comments on 
> [SPARK-15487], this mechanism does not currently work if the reverseProxyUrl 
> includes a context root like the examples above: Most links generated by the 
> Spark UI result in full path URLs (like /proxy/app-"id"/...) that do not 
> account for a path prefix (context root) and work only if the Spark UI "owns" 
> the entire virtual host. In fact, the only place in the UI where the 
> reverseProxyUrl seems to be used is the back-link from the worker UI to the 
> master UI.
> The discussion on [SPARK-15487] proposes to open a new issue for the problem, 
> but that does not seem to have happened, so this issue aims to address the 
> remaining shortcomings of spark.ui.reverseProxyUrl
> The problem can be partially worked around by doing content rewrite in a 
> front-end proxy and prefixing src="/..." or href="/..." links with a context 
> root. However, detecting and patching URLs in HTML output is not a robust 
> approach and breaks down for URLs included in custom REST responses. E.g. the 
> "allexecutors" REST call used from the Spark 2.1.0 application/executors page 
> returns links for log viewing that direct to the worker UI and do not work in 
> this scenario.
> This issue proposes to honor spark.ui.reverseProxyUrl throughout Spark UI URL 
> generation. Experiments indicate that most of this can simply be achieved by 
> using/prepending spark.ui.reverseProxyUrl to the existing spark.ui.proxyBase 
> system property. Beyond that, the places that require adaption are
> - worker and application links in the master web UI
> - webui URLs returned by REST interfaces
> Note: It seems that returned redirect location headers do not need to be 
> adapted, since URL rewriting for these is commonly done in front-end proxies 
> and has a well-defined interface



--
This 

[jira] [Comment Edited] (SPARK-20044) Support Spark UI behind front-end reverse proxy using a path prefix

2017-03-28 Thread Oliver Koeth (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15944778#comment-15944778
 ] 

Oliver Koeth edited comment on SPARK-20044 at 3/28/17 8:43 AM:
---

Added a pull request. Works for me with the following configuration:
{noformat}
spark.ui.reverseProxy=true
spark.ui.reverseProxyUrl=/path/to/spark/
{noformat}

nginx front-end proxy setup:
{noformat}
server {
listen 9000;
set $SPARK_MASTER http://:8080;

# redirect master UI path without terminating slash,
# so that relative URLs are resolved correctly
location ~ ^(?/path/to/spark$) {
return 302 $scheme://$host:$server_port$prefix/;
}

# split spark UI path into prefix and local path within master UI
location ~ ^(?/path/to/spark)(?/.*) {
# strip prefix when forwarding request
rewrite ^ $local_path break;
# forward to spark master UI
proxy_pass $SPARK_MASTER;
# fix host (implicit) and add prefix on redirects
proxy_redirect $SPARK_MASTER $prefix;
}
}
{noformat}


was (Author: okoethibm):
Added a pull request. Works for me with the following configuration:
spark.ui.reverseProxy=true
spark.ui.reverseProxyUrl=/path/to/spark/

nginx front-end proxy setup:
{noformat}
server {
listen 9000;
set $SPARK_MASTER http://:8080;

# redirect master UI path without terminating slash,
# so that relative URLs are resolved correctly
location ~ ^(?/path/to/spark$) {
return 302 $scheme://$host:$server_port$prefix/;
}

# split spark UI path into prefix and local path within master UI
location ~ ^(?/path/to/spark)(?/.*) {
# strip prefix when forwarding request
rewrite ^ $local_path break;
# forward to spark master UI
proxy_pass $SPARK_MASTER;
# fix host (implicit) and add prefix on redirects
proxy_redirect $SPARK_MASTER $prefix;
}
}
{noformat}

> Support Spark UI behind front-end reverse proxy using a path prefix
> ---
>
> Key: SPARK-20044
> URL: https://issues.apache.org/jira/browse/SPARK-20044
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 2.1.0
>Reporter: Oliver Koeth
>Priority: Minor
>  Labels: reverse-proxy, sso
>
> Purpose: allow to run the Spark web UI behind a reverse proxy with URLs 
> prefixed by a context root, like www.mydomain.com/spark. In particular, this 
> allows to access multiple Spark clusters through the same virtual host, only 
> distinguishing them by context root, like www.mydomain.com/cluster1, 
> www.mydomain.com/cluster2, and it allows to run the Spark UI in a common 
> cookie domain (for SSO) with other services.
> [SPARK-15487] introduced some support for front-end reverse proxies by 
> allowing all Spark UI requests to be routed through the master UI as a single 
> endpoint and also added a spark.ui.reverseProxyUrl setting to define a 
> another proxy sitting in front of Spark. However, as noted in the comments on 
> [SPARK-15487], this mechanism does not currently work if the reverseProxyUrl 
> includes a context root like the examples above: Most links generated by the 
> Spark UI result in full path URLs (like /proxy/app-"id"/...) that do not 
> account for a path prefix (context root) and work only if the Spark UI "owns" 
> the entire virtual host. In fact, the only place in the UI where the 
> reverseProxyUrl seems to be used is the back-link from the worker UI to the 
> master UI.
> The discussion on [SPARK-15487] proposes to open a new issue for the problem, 
> but that does not seem to have happened, so this issue aims to address the 
> remaining shortcomings of spark.ui.reverseProxyUrl
> The problem can be partially worked around by doing content rewrite in a 
> front-end proxy and prefixing src="/..." or href="/..." links with a context 
> root. However, detecting and patching URLs in HTML output is not a robust 
> approach and breaks down for URLs included in custom REST responses. E.g. the 
> "allexecutors" REST call used from the Spark 2.1.0 application/executors page 
> returns links for log viewing that direct to the worker UI and do not work in 
> this scenario.
> This issue proposes to honor spark.ui.reverseProxyUrl throughout Spark UI URL 
> generation. Experiments indicate that most of this can simply be achieved by 
> using/prepending spark.ui.reverseProxyUrl to the existing spark.ui.proxyBase 
> system property. Beyond that, the places that require adaption are
> - worker and application links in the master web UI
> - webui URLs returned by REST interfaces
> Note: It seems that returned redirect location headers do not need to be 
> adapted, since URL rewriting for these is commonly done in front-end proxies 
> and has 

[jira] [Comment Edited] (SPARK-20044) Support Spark UI behind front-end reverse proxy using a path prefix

2017-03-28 Thread Oliver Koeth (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15944778#comment-15944778
 ] 

Oliver Koeth edited comment on SPARK-20044 at 3/28/17 8:41 AM:
---

Added a pull request. Works for me with the following configuration:
spark.ui.reverseProxy=true
spark.ui.reverseProxyUrl=/path/to/spark/

nginx front-end proxy setup:

server {
listen 9000;
set $SPARK_MASTER http://:8080;

# redirect master UI path without terminating slash,
# so that relative URLs are resolved correctly
location ~ ^(?/path/to/spark$) {
return 302 $scheme://$host:$server_port$prefix/;
}

# split spark UI path into prefix and local path within master UI
location ~ ^(?/path/to/spark)(?/.*) {
# strip prefix when forwarding request
rewrite ^ $local_path break;
# forward to spark master UI
proxy_pass $SPARK_MASTER;
# fix host (implicit) and add prefix on redirects
proxy_redirect $SPARK_MASTER $prefix;
}
}



was (Author: okoethibm):
Added a pull request. Works for me with the following configuration:
spark.ui.reverseProxy=true
spark.ui.reverseProxyUrl=/path/to/spark/

nginx front-end proxy setup:
server {
listen 9000;
set $SPARK_MASTER http://:8080;

# redirect master UI path without terminating slash,
# so that relative URLs are resolved correctly
location ~ ^(?/path/to/spark$) {
return 302 $scheme://$host:$server_port$prefix/;
}

# split spark UI path into prefix and local path within master UI
location ~ ^(?/path/to/spark)(?/.*) {
# strip prefix when forwarding request
rewrite ^ $local_path break;
# forward to spark master UI
proxy_pass $SPARK_MASTER;
# fix host (implicit) and add prefix on redirects
proxy_redirect $SPARK_MASTER $prefix;
}
}

> Support Spark UI behind front-end reverse proxy using a path prefix
> ---
>
> Key: SPARK-20044
> URL: https://issues.apache.org/jira/browse/SPARK-20044
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 2.1.0
>Reporter: Oliver Koeth
>Priority: Minor
>  Labels: reverse-proxy, sso
>
> Purpose: allow to run the Spark web UI behind a reverse proxy with URLs 
> prefixed by a context root, like www.mydomain.com/spark. In particular, this 
> allows to access multiple Spark clusters through the same virtual host, only 
> distinguishing them by context root, like www.mydomain.com/cluster1, 
> www.mydomain.com/cluster2, and it allows to run the Spark UI in a common 
> cookie domain (for SSO) with other services.
> [SPARK-15487] introduced some support for front-end reverse proxies by 
> allowing all Spark UI requests to be routed through the master UI as a single 
> endpoint and also added a spark.ui.reverseProxyUrl setting to define a 
> another proxy sitting in front of Spark. However, as noted in the comments on 
> [SPARK-15487], this mechanism does not currently work if the reverseProxyUrl 
> includes a context root like the examples above: Most links generated by the 
> Spark UI result in full path URLs (like /proxy/app-"id"/...) that do not 
> account for a path prefix (context root) and work only if the Spark UI "owns" 
> the entire virtual host. In fact, the only place in the UI where the 
> reverseProxyUrl seems to be used is the back-link from the worker UI to the 
> master UI.
> The discussion on [SPARK-15487] proposes to open a new issue for the problem, 
> but that does not seem to have happened, so this issue aims to address the 
> remaining shortcomings of spark.ui.reverseProxyUrl
> The problem can be partially worked around by doing content rewrite in a 
> front-end proxy and prefixing src="/..." or href="/..." links with a context 
> root. However, detecting and patching URLs in HTML output is not a robust 
> approach and breaks down for URLs included in custom REST responses. E.g. the 
> "allexecutors" REST call used from the Spark 2.1.0 application/executors page 
> returns links for log viewing that direct to the worker UI and do not work in 
> this scenario.
> This issue proposes to honor spark.ui.reverseProxyUrl throughout Spark UI URL 
> generation. Experiments indicate that most of this can simply be achieved by 
> using/prepending spark.ui.reverseProxyUrl to the existing spark.ui.proxyBase 
> system property. Beyond that, the places that require adaption are
> - worker and application links in the master web UI
> - webui URLs returned by REST interfaces
> Note: It seems that returned redirect location headers do not need to be 
> adapted, since URL rewriting for these is commonly done in front-end proxies 
> and has a well-defined interface



--
This message was sent by 

[jira] [Commented] (SPARK-20044) Support Spark UI behind front-end reverse proxy using a path prefix

2017-03-28 Thread Oliver Koeth (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15944778#comment-15944778
 ] 

Oliver Koeth commented on SPARK-20044:
--

Added a pull request. Works for me with the following configuration:
spark.ui.reverseProxy=true
spark.ui.reverseProxyUrl=/path/to/spark/

nginx front-end proxy setup:
server {
listen 9000;
set $SPARK_MASTER http://:8080;

# redirect master UI path without terminating slash,
# so that relative URLs are resolved correctly
location ~ ^(?/path/to/spark$) {
return 302 $scheme://$host:$server_port$prefix/;
}

# split spark UI path into prefix and local path within master UI
location ~ ^(?/path/to/spark)(?/.*) {
# strip prefix when forwarding request
rewrite ^ $local_path break;
# forward to spark master UI
proxy_pass $SPARK_MASTER;
# fix host (implicit) and add prefix on redirects
proxy_redirect $SPARK_MASTER $prefix;
}
}

> Support Spark UI behind front-end reverse proxy using a path prefix
> ---
>
> Key: SPARK-20044
> URL: https://issues.apache.org/jira/browse/SPARK-20044
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 2.1.0
>Reporter: Oliver Koeth
>Priority: Minor
>  Labels: reverse-proxy, sso
>
> Purpose: allow to run the Spark web UI behind a reverse proxy with URLs 
> prefixed by a context root, like www.mydomain.com/spark. In particular, this 
> allows to access multiple Spark clusters through the same virtual host, only 
> distinguishing them by context root, like www.mydomain.com/cluster1, 
> www.mydomain.com/cluster2, and it allows to run the Spark UI in a common 
> cookie domain (for SSO) with other services.
> [SPARK-15487] introduced some support for front-end reverse proxies by 
> allowing all Spark UI requests to be routed through the master UI as a single 
> endpoint and also added a spark.ui.reverseProxyUrl setting to define a 
> another proxy sitting in front of Spark. However, as noted in the comments on 
> [SPARK-15487], this mechanism does not currently work if the reverseProxyUrl 
> includes a context root like the examples above: Most links generated by the 
> Spark UI result in full path URLs (like /proxy/app-"id"/...) that do not 
> account for a path prefix (context root) and work only if the Spark UI "owns" 
> the entire virtual host. In fact, the only place in the UI where the 
> reverseProxyUrl seems to be used is the back-link from the worker UI to the 
> master UI.
> The discussion on [SPARK-15487] proposes to open a new issue for the problem, 
> but that does not seem to have happened, so this issue aims to address the 
> remaining shortcomings of spark.ui.reverseProxyUrl
> The problem can be partially worked around by doing content rewrite in a 
> front-end proxy and prefixing src="/..." or href="/..." links with a context 
> root. However, detecting and patching URLs in HTML output is not a robust 
> approach and breaks down for URLs included in custom REST responses. E.g. the 
> "allexecutors" REST call used from the Spark 2.1.0 application/executors page 
> returns links for log viewing that direct to the worker UI and do not work in 
> this scenario.
> This issue proposes to honor spark.ui.reverseProxyUrl throughout Spark UI URL 
> generation. Experiments indicate that most of this can simply be achieved by 
> using/prepending spark.ui.reverseProxyUrl to the existing spark.ui.proxyBase 
> system property. Beyond that, the places that require adaption are
> - worker and application links in the master web UI
> - webui URLs returned by REST interfaces
> Note: It seems that returned redirect location headers do not need to be 
> adapted, since URL rewriting for these is commonly done in front-end proxies 
> and has a well-defined interface



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15487) Spark Master UI to reverse proxy Application and Workers UI

2017-03-22 Thread Oliver Koeth (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15936501#comment-15936501
 ] 

Oliver Koeth commented on SPARK-15487:
--

Seems the follow-up issue was never opened. I created [SPARK-20044] to address 
the problems with running behind a site proxy as  www.mydomain.com/spark

> Spark Master UI to reverse proxy Application and Workers UI
> ---
>
> Key: SPARK-15487
> URL: https://issues.apache.org/jira/browse/SPARK-15487
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 1.6.0, 1.6.1
>Reporter: Gurvinder
>Assignee: Gurvinder
>Priority: Minor
> Fix For: 2.1.0
>
>
> Currently when running in Standalone mode, Spark UI's link to workers and 
> application drivers are pointing to internal/protected network endpoints. So 
> to access workers/application UI user's machine has to connect to VPN or need 
> to have access to internal network directly.
> Therefore the proposal is to make Spark master UI reverse proxy this 
> information back to the user. So only Spark master UI needs to be opened up 
> to internet. 
> The minimal changes can be done by adding another route e.g. 
> http://spark-master.com/target// so when request goes to target, 
> ProxyServlet kicks in and takes the  and forwards the request to it 
> and send response back to user.
> More information about discussions for this features can be found on this 
> mailing list thread 
> http://apache-spark-developers-list.1001551.n3.nabble.com/spark-on-kubernetes-tc17599.html



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20044) Support Spark UI behind front-end reverse proxy using a path prefix

2017-03-22 Thread Oliver Koeth (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15936482#comment-15936482
 ] 

Oliver Koeth commented on SPARK-20044:
--

I tried a few (actually 5) experimental changes, see 
https://github.com/okoethibm/spark/commit/cf889c75be0db938c91695046aa297558217c2c3
With just this, I got the spark UI to run behind nginx + a path prefix, and all 
the UI links that I tried (master, worker and running app) worked fine.
I probably still missed some places that need adjusting, but it does not seem 
like the improvement requires lots of modifications all over the place.

> Support Spark UI behind front-end reverse proxy using a path prefix
> ---
>
> Key: SPARK-20044
> URL: https://issues.apache.org/jira/browse/SPARK-20044
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 2.1.0
>Reporter: Oliver Koeth
>Priority: Minor
>  Labels: reverse-proxy, sso
>
> Purpose: allow to run the Spark web UI behind a reverse proxy with URLs 
> prefixed by a context root, like www.mydomain.com/spark. In particular, this 
> allows to access multiple Spark clusters through the same virtual host, only 
> distinguishing them by context root, like www.mydomain.com/cluster1, 
> www.mydomain.com/cluster2, and it allows to run the Spark UI in a common 
> cookie domain (for SSO) with other services.
> [SPARK-15487] introduced some support for front-end reverse proxies by 
> allowing all Spark UI requests to be routed through the master UI as a single 
> endpoint and also added a spark.ui.reverseProxyUrl setting to define a 
> another proxy sitting in front of Spark. However, as noted in the comments on 
> [SPARK-15487], this mechanism does not currently work if the reverseProxyUrl 
> includes a context root like the examples above: Most links generated by the 
> Spark UI result in full path URLs (like /proxy/app-"id"/...) that do not 
> account for a path prefix (context root) and work only if the Spark UI "owns" 
> the entire virtual host. In fact, the only place in the UI where the 
> reverseProxyUrl seems to be used is the back-link from the worker UI to the 
> master UI.
> The discussion on [SPARK-15487] proposes to open a new issue for the problem, 
> but that does not seem to have happened, so this issue aims to address the 
> remaining shortcomings of spark.ui.reverseProxyUrl
> The problem can be partially worked around by doing content rewrite in a 
> front-end proxy and prefixing src="/..." or href="/..." links with a context 
> root. However, detecting and patching URLs in HTML output is not a robust 
> approach and breaks down for URLs included in custom REST responses. E.g. the 
> "allexecutors" REST call used from the Spark 2.1.0 application/executors page 
> returns links for log viewing that direct to the worker UI and do not work in 
> this scenario.
> This issue proposes to honor spark.ui.reverseProxyUrl throughout Spark UI URL 
> generation. Experiments indicate that most of this can simply be achieved by 
> using/prepending spark.ui.reverseProxyUrl to the existing spark.ui.proxyBase 
> system property. Beyond that, the places that require adaption are
> - worker and application links in the master web UI
> - webui URLs returned by REST interfaces
> Note: It seems that returned redirect location headers do not need to be 
> adapted, since URL rewriting for these is commonly done in front-end proxies 
> and has a well-defined interface



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org