Github user thvasilo commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4916#discussion_r26000373
  
    --- Diff: ec2/spark_ec2.py ---
    @@ -1259,6 +1259,15 @@ def real_main():
                 cluster_instances=(master_nodes + slave_nodes),
                 cluster_state='ssh-ready'
             )
    +
    +        # Determine types of running instances
    +        existing_master_type = master_nodes[0].instance_type
    +        existing_slave_type = slave_nodes[0].instance_type
    +        if existing_master_type == existing_slave_type:
    --- End diff --
    
    @shivaram I think you are correct, the opts.master_instance_type is not 
accessed at any point in setup_cluster, so this is redundant.
    
    But I think this could lead to another bug. Since the master gets set up 
using the slaves' instance type (line 
[860](https://github.com/thvasilo/spark/blob/SPARK-6188%5D-Instance-types-can-be-mislabeled-when-re-starting-cluster-with-default-arguments/ec2/spark_ec2.py#L860)),
 it might be mis-configured in the same way. But perhaps this is not a problem, 
as the master should not be an executor as well.
    
    In that sense deploy_files is a bit problematic currently, as it will 
deploy files on the master that will be rsynced to the slaves, so in one sense 
it does the right thing by deploying the files configured as they should be on 
the slaves, but at the same time, the configuration of the master itself is 
done with the instance type of the slaves, which could be different from the 
masters'.
    
    Example:
    1. User launches cluster, slaves are m3.xlarge, master is m3.large.
    2. User stops and restarts cluster.
    3. Files deployed on master are configured for m3.xlarge, these get rsynced 
to the slaves through setup.sh.
    4. Now the slaves have correct configuration files, but the master has been 
configured as an m3.xlarge instance.
    
    Do you think this can be a problem? Other wise I can just remove the if 
clause as the setting is unnecessary with the current code.
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to