Hi, If you are using pipeline api, you do not need to map features back to documents. Your input (which is the document text) won't change after you used HashingTF. If you want to do Information Retrieval with spark, I suggest you to use not the pipeline but RDDs...
On Fri, Jan 1, 2016 at 2:20 AM, Andy Davidson <a...@santacruzintegration.com > wrote: > Hi > > I am working on proof of concept. I am trying to use spark to classify > some documents. I am using tokenizer and hashingTF to convert the documents > into vectors. Is there any easy way to map feature back to words or do I > need to maintain the reverse index my self? I realize there is a chance > some words map to same buck > > Kind regards > > Andy > > -- Hayri Volkan Agun PhD. Student - Anadolu University