Github user LeoIV commented on the issue:
https://github.com/apache/spark/pull/18636
The problem emerges in cases where you built a whole pipeline. You have a
set of documents you want to classify. These documents have some additional
features and they are preprocessed in the pipeline. When coming to Word2Vec,
you want to vectorize your documents. However, you see bad performance of your
word vectors and you want to tune them by adding additional documents. You
don't want these documents to be part of the whole pipeline, because they are
unable to pass the previous preprocessing steps.
That was my intention to add this. Probably, it is a very rare usecase. I
don't know.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]