Github user tgravescs commented on the pull request:
https://github.com/apache/spark/pull/5294#issuecomment-88254374
Yes that is basically the scenario. Although I would expect it start out
package hadoopA with Spark running on HadoopA, then hadoopB is deployed and
spark with hadoopA runs just fine on hadoopB.
This allows for separate deployments of hadoop and spark. Otherwise you
have to make sure spark and hadoop get deployed everywhere at the same time and
everyone upgrades to new version of spark.
yes it did happen which is what lead me to filing this jira and plan on
changing how we internally package spark. I don't think it will happen real
often but I also don't want this to cause an issue on a production system.
MapReduce has this same issue and we actually package that fully separate to
prevent this. With Hadoop now supporting rolling upgrades this is more of a
concern.
Personally I see things trying to go to more isolated environments where we
aren't making the hadoop and its dependencies be included in everything that
runs on YARN. Many users have issues with dependencies and such and having
this config should at least give them the option.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]