Hi Yanbo,
thanks for info. Is it likely to change in (near :) ) future? Ability
to call this function only on local data (ie not in rdd) seems to be
rather serious limitation.
cheers,
Tomasz
On 02.01.2016 09:45, Yanbo Liang wrote:
Hi Tomasz,
The GMM is bind with the peer Java GMM object, so it need reference to
SparkContext.
Some of MLlib(not ML) models are simple object such as KMeansModel,
LinearRegressionModel etc., but others will refer SparkContext. The
later ones and corresponding member functions should not called in map().
Cheers
Yanbo
2016-01-01 4:12 GMT+08:00 Tomasz Fruboes <[email protected]
<mailto:[email protected]>>:
Dear All,
I'm trying to implement a procedure that iteratively updates a rdd
using results from GaussianMixtureModel.predictSoft. In order to
avoid problems with local variable (the obtained GMM) beeing
overwritten in each pass of the loop I'm doing the following:
#######################################################
for i in xrange(10):
gmm = GaussianMixture.train(rdd, 2)
def getSafePredictor(unsafeGMM):
return lambda x: \
(unsafeGMM.predictSoft(x.features),
unsafeGMM.gaussians.mu <http://unsafeGMM.gaussians.mu>)
safePredictor = getSafePredictor(gmm)
predictionsRDD = (labelledpointrddselectedfeatsNansPatched
.map(safePredictor)
)
print predictionsRDD.take(1)
(... - rest of code - update rdd with results from predictionsRdd)
#######################################################
Unfortunately this ends with:
#######################################################
Exception: It appears that you are attempting to reference
SparkContext from a broadcast variable, action, or transformation.
SparkContext can only be used on the driver, not in code that it run
on workers. For more information, see SPARK-5063.
#######################################################
Any idea why I'm getting this behaviour? My expectation would be,
that GMM should be a "simple" object without SparkContext in it.
I'm using spark 1.5.2
Thanks,
Tomasz
ps As a workaround I'm doing currently
########################
def getSafeGMM(unsafeGMM):
return lambda x: unsafeGMM.predictSoft(x)
safeGMM = getSafeGMM(gmm)
predictionsRDD = \
safeGMM(labelledpointrddselectedfeatsNansPatched.map(rdd))
########################
which works fine. If it's possible I would like to avoid this
approach, since it would require to perform another closure on
gmm.gaussians later in my code
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
<mailto:[email protected]>
For additional commands, e-mail: [email protected]
<mailto:[email protected]>
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]