Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/3916#issuecomment-75498418
  
    Hey @vanzin okay took another quite long look (sorry, was delayed this week 
due to strata) and I have embarrassingly few useful comments given how long I 
looked at it. Overall, the changes you made all seem great. Unfortunately since 
spark-submit as it stands before your patch is a fairly wide interface, it's 
difficult for me to conceive of every possible combination of 
settings/environments/etc, etc that could be broken by this. I think we'll just 
have to merge it and hope it gets a substantial amount of user testing.
    
    The main thing left for me is the proposal to expose this as a public 
interface to Spark for application developers. The main issue I see is that it 
just won't be backwards compatible as it stands now, and this is something 
where I think compatibility is pretty important. One main use case I see here 
is some third party application wants to have a nice way of submitting Spark 
applications for user clusters that already have Spark installed, so they 
bundle this up with their app. At least, that's the main reason I've heard 
users directly ask for this feature. If they have to grab a newer version of 
this for every different Spark version users might have installed, that's a big 
pain for them. And it's the exact opposite API guarantees as the rest of Spark. 
And there is an alternative that _is_ backwards compatible which is for them to 
just script around `spark-submit`. So users are in a weird place where there 
are two options and neither is strictly optimal. 
    
    To me that seems like a blocker to exposing it in the current form without 
exploring more options.
    
    If we want to make this backwards compatible, we'd need to minimize the 
interface between the publicly exposed library and the supplied Spark 
distribution. If the interface from the user app is that we fork a subprocess 
when they call launch(), could that subprocess just invoke the spark-submit 
script in the distribution, which will then itself compute the classpath, etc? 
We'd just have to detect whether to invoke the windows or unix version, but 
that seems pretty doable. Could that work?
    
    As a point of process, I'd be happy to explore various options for making 
this work by just merging a version where this is private and then building on 
top of that, to avoid staleness and allow for ample testing. I really do think 
its a good idea to have a programmatic version of spark-submit.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to