Liang-Chi Hsieh created SPARK-3327:
--------------------------------------

             Summary: Make broadcasted value mutable for caching useful 
information
                 Key: SPARK-3327
                 URL: https://issues.apache.org/jira/browse/SPARK-3327
             Project: Spark
          Issue Type: New Feature
            Reporter: Liang-Chi Hsieh


When implementing some algorithms, it is helpful that we can cache some useful 
information for using later.

Specifically, we would like to performa operation "A" on each partition of 
data. Some variables are updated. Then we want to run operation "B" on the data 
too. "B" operation uses the variables updated by operation "A".

One of the examples is the Liblinear on Spark from Dr. Lin. They discuss the 
problem in Section IV.D of the paper "Large-scale Logistic Regression and 
Linear Support Vector Machines Using Spark."

Currently broadcasted variables can satisfy partial need for that. We can 
broadcast variables to reduce communication costs. However, because broadcasted 
variables can not be modified, it doesn't help solve the problem and we maybe 
need to collect updated variables back to master and broadcast them again 
before conducting next data operation.

I would like to add an interface to broadcasted variables to make them mutable 
so later data operations can use them again.


 




--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to