Hi, I have a JSON file in the following structure: +--------------------+-------------------+ | full_text| id| +--------------------+-------------------+
I want to tokenize each sentence into pairs of (word, id) for example, having the record : ("Hi, How are you?", id) I want to convert the dataframe to: hi, id how, id are, id you, id ?, id So I try : data.rdd.map(lambda data : (data[0], data[1]))\ .flatMap(lambda row: (word_tokenize(row[0].lower()), row[1]) but it converts the dataframe to: [hi, how, are, you, ?] How can I do the desired transformation?