[GitHub] spark pull request: SPARK-1099: Make local mode use all available ...

mridulm Wed, 19 Mar 2014 21:59:25 -0700

Github user mridulm commented on the pull request:

    https://github.com/apache/spark/pull/182#issuecomment-38135323
  
    On Wed, Mar 19, 2014 at 8:42 PM, Aaron Davidson 
<[email protected]>wrote:
    
    > This *is* an incompatible change, which is why it is targeted for 1.0. It
    > is my opinion that this is how it should've been in the first place; since
    > Spark is inherently a distributed system, it doesn't make sense for an
    > out-of-the-box "spark-shell" to run on one thread.
    >
    
    If we were adding support for local today a variant of this PR would be a
    good default to choose.
    I am not convinced about the need to break compatibility in this case - yes
    we have ability to break compatibility, but we should be careful while
    doing so.
    Particularly when the behaviour was explicitly documented.
    
    There are bunch of MT unsafe code which uses local (should have been
    local[1], but they were documented to be the same until now) and/or cluster
    mode with num cores == 1
    
    
    
    > We use availableProcessors() elsewhere in the Spark codebase, and here it
    > is used only as a reasonable default. "local" mode is only intended for
    > testing purposes, so the impact and severity of this change is limited.
    >
    
    We use that api in only one other place - which is pretty similar to this,
    and is also fairly dodgy.
    
    'local' is very commonly used by our users, so it is not restricted to
    testing (if it was just testing, modifying the code to use higher value
    would be the solution).
    If the intention is to ensure good out of the box use of all cores of a
    system when users try/play with spark - then would be better to point users
    to local[N] and specify that we are running with a single thread via some
    log message.
    
    
    
    > (Also I'd like to point out that the tests that needed to be modified were
    > actually making false assumptions about partition ordering that just
    > happened to work out since it was using 1 thread.)
    >
    
    That would point to probably broken tests.
    We should not assume the number of workers a test would be run on (except
    for scheduler tests, etc ofcourse !).
    
    
    Regards,
    Mridul
    
    
    > --
    > Reply to this email directly or view it on 
GitHub<https://github.com/apache/spark/pull/182#issuecomment-38132716>
    > .
    >



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1099: Make local mode use all available ...

Reply via email to