Github user viirya commented on the pull request:
https://github.com/apache/spark/pull/2217#issuecomment-54278826
@rxin. I need a way to modify broadcasted variables locally and keep those
variables for later use. The locally modified variables are used to store some
values calculated at earlier stage of machine learning algorithm. Those values
would be used at later stages.
In particular, the algorithm calculates different parameter P for different
data partitions using mapPartitionsWithIndex at its first stage. In later
stage, the algorithm use parameter P to perform learning.
Under current broadcasted variables, I need to collect calculated values of
the earlier stage and re-broadcast them to later stages.
Since current broadcasted variables are immutable, the earlier stage can
not modify these variables locally for different partitions of data. So I am
wondering if we can provide a mechanism to allow tasks to have locally mutable
values for different partitions. Thus I do modify the broadcast interface to
provide such function. However, maybe it should be separated from broadcast
module.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]