Try flatMapToPair instead of flatMap

Thanks,
Sonal
Nube Technologies <http://www.nubetech.co>

<http://in.linkedin.com/in/sonalgoyal>



On Tue, Jun 19, 2018 at 11:08 PM, Soheil Pourbafrani <soheil.i...@gmail.com>
wrote:

> Hi, I have a JSON file in the following structure:
> +--------------------+-------------------+
> |           full_text|                 id|
> +--------------------+-------------------+
>
> I want to tokenize each sentence into pairs of (word, id)
>
> for example, having the record : ("Hi, How are you?", id) I want to
> convert the dataframe to:
> hi, id
> how, id
> are, id
> you, id
> ?, id
>
> So I try :
>
> data.rdd.map(lambda data : (data[0], data[1]))\
>    .flatMap(lambda row: (word_tokenize(row[0].lower()), row[1])
>
> but it converts the dataframe to:
> [hi, how, are, you, ?]
>
> How can I do the desired transformation?
>

Reply via email to