Thanks Ayan and NIcholas for your jetfast reply ! Appreciate it a lot. Cheers,
Debu On Fri, Oct 13, 2017 at 9:27 AM, ayan guha <guha.a...@gmail.com> wrote: > Quick pyspark code: > > >>> s = "ABZ|ABZ|AF|2,3,7,8,B,C,D,E,J,K,L,M,P,Q,T,U,X,Y|1,2,3,4,5|730" > >>> base = sc.parallelize([s.split("|")]) > >>> base.take(10) > [['ABZ', 'ABZ', 'AF', '2,3,7,8,B,C,D,E,J,K,L,M,P,Q,T,U,X,Y', '1,2,3,4,5', > '730']] > > >>> def pv(t): > ... x = t[3].split(",") > ... y = t[4].split(",") > ... for k in product(x,y): > ... yield (t[0],t[1],k[0],k[1],t[5]) > ... > >>> res = base.flatMap(pv) > >>> res.take(10) > [('ABZ', 'ABZ', '2', '1', '730'), ('ABZ', 'ABZ', '2', '2', '730'), ('ABZ', > 'ABZ', '2', '3', '730'), ('ABZ', 'ABZ', '2', '4', '730'), ('ABZ', 'ABZ', > '2', '5', '730'), ('ABZ', 'ABZ', '3', '1', '730'), ('ABZ', 'ABZ', '3', '2', > '730'), ('ABZ', 'ABZ', '3', '3', '730'), ('ABZ', 'ABZ', '3', '4', '730'), > ('ABZ', 'ABZ', '3', '5', '730')] > > > > On Fri, Oct 13, 2017 at 6:03 AM, Nicholas Hakobian <nicholas.hakobian@ > rallyhealth.com> wrote: > >> Using explode on the 4th column, followed by an explode on the 5th column >> would produce what you want (you might need to use split on the columns >> first if they are not already an array). >> >> Nicholas Szandor Hakobian, Ph.D. >> Staff Data Scientist >> Rally Health >> nicholas.hakob...@rallyhealth.com >> >> >> On Thu, Oct 12, 2017 at 9:09 AM, Debabrata Ghosh <mailford...@gmail.com> >> wrote: >> >>> Hi, >>> Greetings ! >>> >>> I am having data in the format of the following row: >>> >>> ABZ|ABZ|AF|2,3,7,8,B,C,D,E,J,K,L,M,P,Q,T,U,X,Y|1,2,3,4,5|730 >>> >>> I want to convert it into several rows in the format below: >>> >>> ABZ|ABZ|AF|2|1|730 >>> ABZ|ABZ|AF|3+1|730 >>> . >>> . >>> . >>> ABZ|ABZ|AF|3|1|730 >>> ABZ|ABZ|AF|3|2|730 >>> ABZ|ABZ|AF|3|3|730 >>> . >>> . >>> . >>> ABZ|ABZ|AF|Y|4|730 >>> ABZ|ABZ|AF||Y|5|730 >>> >>> Basically, I want to consider the various combinations of the 4th and >>> 5th columns (where the values are delimited by commas) and accordingly >>> generate the above rows from a single row. Please can you suggest me for a >>> good way of acheiving this. Thanks in advance ! >>> >>> Regards, >>> >>> Debu >>> >> >> > > > -- > Best Regards, > Ayan Guha >