Github user vanzin commented on the pull request:

    https://github.com/apache/spark/pull/5085#issuecomment-84222414
  
    When I said "before" I meant the code currently in branch-1.3. It's faster 
now, even with using the assembly for both invocations, because that's still 
loading the assembly less times than it did before (which could be up to 3 
IIRC), it avoids invoking `java` just to see what version it is, and Java code 
is generally faster than bash code.
    
    The added overhead of using the assembly for the first invocation (instead 
of the smaller jar) is not enough to make performance be as slow as before; 
it's still considerably faster (400ms before vs 150ms now in my system). So 
that's not really a big source of worries here, and the code changes shouldn't 
really be too bad.
    
    Also, I'd argue that most of the startup time of spark-shell is actually 
loading all the Scala classes (which the launcher invocation doesn't do since 
it's pure java) and contacting the cluster manager. That should make the 
overhead of the startup script be mostly insignificant.
    
    The remainder change then just builds on top of this change; instead of 
having code both in the scripts and in the library to find the assembly, it's 
just easier to tell the library where the assembly is (since the scripts 
already know).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to