Thanks Ayan and NIcholas for your jetfast reply ! Appreciate it a lot.

Cheers,

Debu

On Fri, Oct 13, 2017 at 9:27 AM, ayan guha <guha.a...@gmail.com> wrote:

> Quick pyspark code:
>
> >>> s = "ABZ|ABZ|AF|2,3,7,8,B,C,D,E,J,K,L,M,P,Q,T,U,X,Y|1,2,3,4,5|730"
> >>> base = sc.parallelize([s.split("|")])
> >>> base.take(10)
> [['ABZ', 'ABZ', 'AF', '2,3,7,8,B,C,D,E,J,K,L,M,P,Q,T,U,X,Y', '1,2,3,4,5',
> '730']]
>
> >>> def pv(t):
> ...     x = t[3].split(",")
> ...     y = t[4].split(",")
> ...     for k in product(x,y):
> ...         yield (t[0],t[1],k[0],k[1],t[5])
> ...
> >>> res = base.flatMap(pv)
> >>> res.take(10)
> [('ABZ', 'ABZ', '2', '1', '730'), ('ABZ', 'ABZ', '2', '2', '730'), ('ABZ',
> 'ABZ', '2', '3', '730'), ('ABZ', 'ABZ', '2', '4', '730'), ('ABZ', 'ABZ',
> '2', '5', '730'), ('ABZ', 'ABZ', '3', '1', '730'), ('ABZ', 'ABZ', '3', '2',
> '730'), ('ABZ', 'ABZ', '3', '3', '730'), ('ABZ', 'ABZ', '3', '4', '730'),
> ('ABZ', 'ABZ', '3', '5', '730')]
>
>
>
> On Fri, Oct 13, 2017 at 6:03 AM, Nicholas Hakobian <nicholas.hakobian@
> rallyhealth.com> wrote:
>
>> Using explode on the 4th column, followed by an explode on the 5th column
>> would produce what you want (you might need to use split on the columns
>> first if they are not already an array).
>>
>> Nicholas Szandor Hakobian, Ph.D.
>> Staff Data Scientist
>> Rally Health
>> nicholas.hakob...@rallyhealth.com
>>
>>
>> On Thu, Oct 12, 2017 at 9:09 AM, Debabrata Ghosh <mailford...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>         Greetings !
>>>
>>> I am having data in the format of the following row:
>>>
>>> ABZ|ABZ|AF|2,3,7,8,B,C,D,E,J,K,L,M,P,Q,T,U,X,Y|1,2,3,4,5|730
>>>
>>> I want to convert it into several rows in the format below:
>>>
>>> ABZ|ABZ|AF|2|1|730
>>> ABZ|ABZ|AF|3+1|730
>>> .
>>> .
>>> .
>>> ABZ|ABZ|AF|3|1|730
>>> ABZ|ABZ|AF|3|2|730
>>> ABZ|ABZ|AF|3|3|730
>>> .
>>> .
>>> .
>>> ABZ|ABZ|AF|Y|4|730
>>> ABZ|ABZ|AF||Y|5|730
>>>
>>> Basically, I want to consider the various combinations of the 4th and
>>> 5th columns (where the values are delimited by commas) and accordingly
>>> generate the above rows from a single row. Please can you suggest me for a
>>> good way of acheiving this. Thanks in advance !
>>>
>>> Regards,
>>>
>>> Debu
>>>
>>
>>
>
>
> --
> Best Regards,
> Ayan Guha
>

Reply via email to