[GitHub] spark issue #18636: added support word2vec training with additional data

LeoIV Wed, 20 Sep 2017 00:38:26 -0700

Github user LeoIV commented on the issue:

    https://github.com/apache/spark/pull/18636
  
    The problem emerges in cases where you built a whole pipeline. You have a 
set of documents you want to classify. These documents have some additional 
features and they are preprocessed in the pipeline. When coming to Word2Vec, 
you want to vectorize your documents. However, you see bad performance of your 
word vectors and you want to tune them by adding additional documents. You 
don't want these documents to be part of the whole pipeline, because they are 
unable to pass the previous preprocessing steps.
    
    That was my intention to add this. Probably, it is a very rare usecase. I 
don't know.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #18636: added support word2vec training with additional data

Reply via email to