Try flatMapToPair instead of flatMap Thanks, Sonal Nube Technologies <http://www.nubetech.co>
<http://in.linkedin.com/in/sonalgoyal> On Tue, Jun 19, 2018 at 11:08 PM, Soheil Pourbafrani <soheil.i...@gmail.com> wrote: > Hi, I have a JSON file in the following structure: > +--------------------+-------------------+ > | full_text| id| > +--------------------+-------------------+ > > I want to tokenize each sentence into pairs of (word, id) > > for example, having the record : ("Hi, How are you?", id) I want to > convert the dataframe to: > hi, id > how, id > are, id > you, id > ?, id > > So I try : > > data.rdd.map(lambda data : (data[0], data[1]))\ > .flatMap(lambda row: (word_tokenize(row[0].lower()), row[1]) > > but it converts the dataframe to: > [hi, how, are, you, ?] > > How can I do the desired transformation? >