Hi,

If you are using pipeline api, you do not need to map features back to
documents.
Your input (which is the document text) won't change after you used
HashingTF.
If you want to do Information Retrieval with spark, I suggest you to use
not the pipeline but RDDs...

On Fri, Jan 1, 2016 at 2:20 AM, Andy Davidson <a...@santacruzintegration.com
> wrote:

> Hi
>
> I am working on proof of concept. I am trying to use spark to classify
> some documents. I am using tokenizer and hashingTF to convert the documents
> into vectors. Is there any easy way to map feature back to words or do I
> need to maintain the reverse index my self? I realize there is a chance
> some words map to same buck
>
> Kind regards
>
> Andy
>
>


-- 
Hayri Volkan Agun
PhD. Student - Anadolu University

Reply via email to