Github user thvasilo commented on a diff in the pull request:
https://github.com/apache/spark/pull/4916#discussion_r26000373
--- Diff: ec2/spark_ec2.py ---
@@ -1259,6 +1259,15 @@ def real_main():
cluster_instances=(master_nodes + slave_nodes),
cluster_state='ssh-ready'
)
+
+ # Determine types of running instances
+ existing_master_type = master_nodes[0].instance_type
+ existing_slave_type = slave_nodes[0].instance_type
+ if existing_master_type == existing_slave_type:
--- End diff --
@shivaram I think you are correct, the opts.master_instance_type is not
accessed at any point in setup_cluster, so this is redundant.
But I think this could lead to another bug. Since the master gets set up
using the slaves' instance type (line
[860](https://github.com/thvasilo/spark/blob/SPARK-6188%5D-Instance-types-can-be-mislabeled-when-re-starting-cluster-with-default-arguments/ec2/spark_ec2.py#L860)),
it might be mis-configured in the same way. But perhaps this is not a problem,
as the master should not be an executor as well.
In that sense deploy_files is a bit problematic currently, as it will
deploy files on the master that will be rsynced to the slaves, so in one sense
it does the right thing by deploying the files configured as they should be on
the slaves, but at the same time, the configuration of the master itself is
done with the instance type of the slaves, which could be different from the
masters'.
Example:
1. User launches cluster, slaves are m3.xlarge, master is m3.large.
2. User stops and restarts cluster.
3. Files deployed on master are configured for m3.xlarge, these get rsynced
to the slaves through setup.sh.
4. Now the slaves have correct configuration files, but the master has been
configured as an m3.xlarge instance.
Do you think this can be a problem? Other wise I can just remove the if
clause as the setting is unnecessary with the current code.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]