[jira] [Commented] (SPARK-2313) PySpark should accept port via a command line argument rather than STDIN

2015-02-14 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14321236#comment-14321236
 ] 

Apache Spark commented on SPARK-2313:
-

User 'JoshRosen' has created a pull request for this issue:
https://github.com/apache/spark/pull/4603

> PySpark should accept port via a command line argument rather than STDIN
> 
>
> Key: SPARK-2313
> URL: https://issues.apache.org/jira/browse/SPARK-2313
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Reporter: Patrick Wendell
>
> Relying on stdin is a brittle mechanism and has broken several times in the 
> past. From what I can tell this is used only to bootstrap worker.py one time. 
> It would be strictly simpler to just pass it is a command line.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2313) PySpark should accept port via a command line argument rather than STDIN

2015-02-12 Thread Matthew Farrellee (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14318221#comment-14318221
 ] 

Matthew Farrellee commented on SPARK-2313:
--

that'd work, also requires a py4j change

> PySpark should accept port via a command line argument rather than STDIN
> 
>
> Key: SPARK-2313
> URL: https://issues.apache.org/jira/browse/SPARK-2313
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Reporter: Patrick Wendell
>
> Relying on stdin is a brittle mechanism and has broken several times in the 
> past. From what I can tell this is used only to bootstrap worker.py one time. 
> It would be strictly simpler to just pass it is a command line.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2313) PySpark should accept port via a command line argument rather than STDIN

2014-11-24 Thread Davies Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14223583#comment-14223583
 ] 

Davies Liu commented on SPARK-2313:
---

[~farrellee] Thew new approach could be:

1) bind to random socket in python, 
2) pass the port into JVM, connect to it
3) Java Gateway binds to random port
4) pass the port back via socket (created in 1)
5) read the port from socket (created in 1), close it

The logic will similar as current, the cost is create a temporary socket.

> PySpark should accept port via a command line argument rather than STDIN
> 
>
> Key: SPARK-2313
> URL: https://issues.apache.org/jira/browse/SPARK-2313
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Reporter: Patrick Wendell
>
> Relying on stdin is a brittle mechanism and has broken several times in the 
> past. From what I can tell this is used only to bootstrap worker.py one time. 
> It would be strictly simpler to just pass it is a command line.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2313) PySpark should accept port via a command line argument rather than STDIN

2014-11-24 Thread Lv, Qi (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14222765#comment-14222765
 ] 

Lv, Qi commented on SPARK-2313:
---

I've submitted a patch to fix this issue:
https://github.com/apache/spark/pull/3424




> PySpark should accept port via a command line argument rather than STDIN
> 
>
> Key: SPARK-2313
> URL: https://issues.apache.org/jira/browse/SPARK-2313
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Reporter: Patrick Wendell
>
> Relying on stdin is a brittle mechanism and has broken several times in the 
> past. From what I can tell this is used only to bootstrap worker.py one time. 
> It would be strictly simpler to just pass it is a command line.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2313) PySpark should accept port via a command line argument rather than STDIN

2014-11-23 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14222763#comment-14222763
 ] 

Apache Spark commented on SPARK-2313:
-

User 'lvsoft' has created a pull request for this issue:
https://github.com/apache/spark/pull/3424

> PySpark should accept port via a command line argument rather than STDIN
> 
>
> Key: SPARK-2313
> URL: https://issues.apache.org/jira/browse/SPARK-2313
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Reporter: Patrick Wendell
>
> Relying on stdin is a brittle mechanism and has broken several times in the 
> past. From what I can tell this is used only to bootstrap worker.py one time. 
> It would be strictly simpler to just pass it is a command line.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2313) PySpark should accept port via a command line argument rather than STDIN

2014-07-16 Thread Matthew Farrellee (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14063454#comment-14063454
 ] 

Matthew Farrellee commented on SPARK-2313:
--

as this stands, having another communication mechanism for py4j that can be 
controlled by the parent is the proper solution. using something like a domain 
socket may also assist in the return path from py4j (tmp file).

fyi, a recent change pushed all existing output to stderr in the 
spark-class/spark-submit path

i'm not actively working on this

> PySpark should accept port via a command line argument rather than STDIN
> 
>
> Key: SPARK-2313
> URL: https://issues.apache.org/jira/browse/SPARK-2313
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Reporter: Patrick Wendell
>
> Relying on stdin is a brittle mechanism and has broken several times in the 
> past. From what I can tell this is used only to bootstrap worker.py one time. 
> It would be strictly simpler to just pass it is a command line.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-2313) PySpark should accept port via a command line argument rather than STDIN

2014-06-28 Thread Matthew Farrellee (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046843#comment-14046843
 ] 

Matthew Farrellee commented on SPARK-2313:
--

components involved -
 0. pyspark - python program that initiates a py4j setup when constructing the 
SparkContext (calls launch_gateway form java_gateway.py)
 1. launch_gateway - invokes "o.a.s.d.SparkSubmit pyspark-shell" via 
spark-class via spark-submit, which invokes py4j.GatewayServer
 2. py4j.GatewayServer - py4j specific code that listens on a port and prints 
it to stdout (see GatewayServer.java#L610)
 3. launch_gateway - reads the port from stdin and constructs the client side 
of the py4j channel

comments -
 a. by allowing the child to pick an ephemeral port there's a guarantee of 
success (except for the case of no available ports)
 b. having the parent pick a port and pass it to the child introduces a risk 
that when the child tries to use the port it will no longer be available. thus, 
not strictly simpler to keep the same guarantees that currently exist.
 c. printing the port to stdout from the child (py4j gatewayserver) is the 
intended method for discovery, see 
https://github.com/bartdag/py4j/blob/master/py4j-java/src/py4j/GatewayServer.java#L610
 d. any data on stdout from spark-submit, spark-class or o.a.s.d.SparkSubmit 
can interfere with the py4j setup

because of (d), i consider this fragile - good meaning, unrelated changes are 
likely to break it.

i'll take a look at this

> PySpark should accept port via a command line argument rather than STDIN
> 
>
> Key: SPARK-2313
> URL: https://issues.apache.org/jira/browse/SPARK-2313
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Reporter: Patrick Wendell
>
> Relying on stdin is a brittle mechanism and has broken several times in the 
> past. From what I can tell this is used only to bootstrap worker.py one time. 
> It would be strictly simpler to just pass it is a command line.



--
This message was sent by Atlassian JIRA
(v6.2#6252)