Github user holdenk commented on the issue:
    So when you say "second pass over the data" - from looking at this it seems 
like it would could do this with just a second map to look up the predictions 
in the already computed cluster centers, not a stage boundary, so that probably 
wouldn't be all that expensive given how Spark does pipe-lining unless I'm 
mussing something.
    This would mean that we'd have to have people set the cluster centers from 
their model when they wanted to do that evaluation type but given that the 
evaluate wouldn't be able to recover the cluster centers from a test that 
differed from the training set I think that would be reasonable.
    That being said its been awhile since I've looked at the evaluator code so 
I could be coming out of left field.


To unsubscribe, e-mail:
For additional commands, e-mail:

Reply via email to