Re: How to FLATTEN hive column in Pig with ARRAY data type
Awesome... that's the way I would have done it as well. On Mon, Jun 2, 2014 at 10:14 AM, Rahul Channe wrote: > I tried changing the hive column datatype from ARRAY to STRUCT for > cust_address, then i imported the table in pig. > > Now I am able to separate the fields, as below > > grunt> Z = load 'cust_info' using org.apache.hcatalog.pig.HCatLoader(); > grunt> describe Z; > Z: {cust_id: int,cust_name: chararray,cust_address: (house_no: int,street: > chararray,city: chararray)} > > > grunt> Y = foreach Z generate cust_address.house_no as > house_no,cust_address.street as street,UPPER(cust_address.city) as city; > grunt> describe Y; > Y: {house_no: int,street: chararray,city: chararray} > > grunt> dump Y; > (2200,benjamin franklin,PHILADELPHIA) > (44,atlanta franklin,FLORIDA) > > > On Mon, Jun 2, 2014 at 1:09 PM, Rahul Channe > wrote: > > > grunt> B = foreach A generate BagToTuple(cust_address); > > > > grunt> describe B; > > B: {org.apache.pig.builtin.bagtotuple_cust_address_24: (innerfield: > > chararray)} > > > > grunt> dump B; > > ((2200,benjamin franklin,philadelphia)) > > ((44,atlanta franklin,florida)) > > > > > > > > > > On Mon, Jun 2, 2014 at 12:59 PM, Pradeep Gollakota > > > wrote: > > > >> If you're using the built-in BagToTuple UDF, then you probably don't > need > >> the FLATTEN operator. > >> > >> I suspect that your output looks as follows: > >> > >> 2200 > >> benjamin avenue > >> philadelphia > >> ... > >> > >> Can you confirm that this is what you're seeing? > >> > >> > >> On Mon, Jun 2, 2014 at 9:52 AM, Rahul Channe > >> wrote: > >> > >> > Thank You Pradeep, it worked to a certain extend but having following > >> > difficulty in separating fields as $0,$1 for the customer_address. > >> > > >> > > >> > Example - > >> > > >> > grunt> describe A; > >> > A: {cust_id: int,cust_name: chararray,cust_address: {innertuple: > >> > (innerfield: chararray)},cust_email: chararray} > >> > > >> > grunt> dump A; > >> > > >> > (123,phil abc,{(2200),(benjamin avenue),(philadelphia)}, > t...@gmail.com) > >> > (124,diego arty,{(44),(atlanta franklin),(florida)},o...@gmail.com) > >> > > >> > grunt> B = foreach A generate FLATTEN(BagToTuple(cust_address)); > >> > grunt> dump B; > >> > (2200,benjamin franklin,philadelphia) > >> > (44,atlanta franklin,florida) > >> > > >> > grunt> describe B; > >> > B: {org.apache.pig.builtin.bagtotuple_cust_address_34::innerfield: > >> > chararray} > >> > > >> > > >> > > >> > I am not able to seperate the fields in B as $0,$1 and $3 ,tried using > >> > STRSPLIT but didnt work. > >> > > >> > > >> > > >> > On Mon, Jun 2, 2014 at 11:50 AM, Pradeep Gollakota < > >> pradeep...@gmail.com> > >> > wrote: > >> > > >> > > There was a similar question as this on StackOverflow a while back. > >> The > >> > > suggestion was to write a custom BagToTuple UDF. > >> > > > >> > > > >> > > > >> > > >> > http://stackoverflow.com/questions/18544602/how-to-flatten-a-group-into-a-single-tuple-in-pig > >> > > > >> > > > >> > > On Mon, Jun 2, 2014 at 8:46 AM, Pradeep Gollakota < > >> pradeep...@gmail.com> > >> > > wrote: > >> > > > >> > > > Disregard last email. > >> > > > > >> > > > Sorry... didn't fully understand the question. > >> > > > > >> > > > > >> > > > On Mon, Jun 2, 2014 at 8:44 AM, Pradeep Gollakota < > >> > pradeep...@gmail.com> > >> > > > wrote: > >> > > > > >> > > >> FOREACH A GENERATE cust_id, cust_name, FLATTEN(cust_address), > >> > > cust_email; > >> > > >> > >> > > >> > >> > > >> > >> > > >> > >> > > >> On Sun, Jun 1, 2014 at 5:54 PM, Rahul Channe < > >> drah...@googlemail.com> > >> > > >> wrote: > >> > > >> > >> > > >>> Hi All, > >> > > >>> > >> > > >>> I have imported hive table into pig having a complex data type > >> > > >>> (ARRAY). The alias in pig looks as below > >> > > >>> > >> > > >>> grunt> describe A; > >> > > >>> A: {cust_id: int,cust_name: chararray,cust_address: {innertuple: > >> > > >>> (innerfield: chararray)},cust_email: chararray} > >> > > >>> > >> > > >>> grunt> dump A; > >> > > >>> > >> > > >>> (123,phil abc,{(2200),(benjamin avenue),(philadelphia)}, > >> > t...@gmail.com > >> > > ) > >> > > >>> (124,diego arty,{(44),(atlanta franklin),(florida)}, > >> o...@gmail.com) > >> > > >>> > >> > > >>> The cust_address is the ARRAY field from hive. I want to FLATTEN > >> the > >> > > >>> cust_address into different fields. > >> > > >>> > >> > > >>> > >> > > >>> Expected output > >> > > >>> (2200,benjamin avenue,philadelphia) > >> > > >>> (44,atlanta franklin,florida) > >> > > >>> > >> > > >>> please help > >> > > >>> > >> > > >>> Regards, > >> > > >>> Rahul > >> > > >>> > >> > > >> > >> > > >> > >> > > > > >> > > > >> > > >> > > > > >
Re: How to FLATTEN hive column in Pig with ARRAY data type
I tried changing the hive column datatype from ARRAY to STRUCT for cust_address, then i imported the table in pig. Now I am able to separate the fields, as below grunt> Z = load 'cust_info' using org.apache.hcatalog.pig.HCatLoader(); grunt> describe Z; Z: {cust_id: int,cust_name: chararray,cust_address: (house_no: int,street: chararray,city: chararray)} grunt> Y = foreach Z generate cust_address.house_no as house_no,cust_address.street as street,UPPER(cust_address.city) as city; grunt> describe Y; Y: {house_no: int,street: chararray,city: chararray} grunt> dump Y; (2200,benjamin franklin,PHILADELPHIA) (44,atlanta franklin,FLORIDA) On Mon, Jun 2, 2014 at 1:09 PM, Rahul Channe wrote: > grunt> B = foreach A generate BagToTuple(cust_address); > > grunt> describe B; > B: {org.apache.pig.builtin.bagtotuple_cust_address_24: (innerfield: > chararray)} > > grunt> dump B; > ((2200,benjamin franklin,philadelphia)) > ((44,atlanta franklin,florida)) > > > > > On Mon, Jun 2, 2014 at 12:59 PM, Pradeep Gollakota > wrote: > >> If you're using the built-in BagToTuple UDF, then you probably don't need >> the FLATTEN operator. >> >> I suspect that your output looks as follows: >> >> 2200 >> benjamin avenue >> philadelphia >> ... >> >> Can you confirm that this is what you're seeing? >> >> >> On Mon, Jun 2, 2014 at 9:52 AM, Rahul Channe >> wrote: >> >> > Thank You Pradeep, it worked to a certain extend but having following >> > difficulty in separating fields as $0,$1 for the customer_address. >> > >> > >> > Example - >> > >> > grunt> describe A; >> > A: {cust_id: int,cust_name: chararray,cust_address: {innertuple: >> > (innerfield: chararray)},cust_email: chararray} >> > >> > grunt> dump A; >> > >> > (123,phil abc,{(2200),(benjamin avenue),(philadelphia)},t...@gmail.com) >> > (124,diego arty,{(44),(atlanta franklin),(florida)},o...@gmail.com) >> > >> > grunt> B = foreach A generate FLATTEN(BagToTuple(cust_address)); >> > grunt> dump B; >> > (2200,benjamin franklin,philadelphia) >> > (44,atlanta franklin,florida) >> > >> > grunt> describe B; >> > B: {org.apache.pig.builtin.bagtotuple_cust_address_34::innerfield: >> > chararray} >> > >> > >> > >> > I am not able to seperate the fields in B as $0,$1 and $3 ,tried using >> > STRSPLIT but didnt work. >> > >> > >> > >> > On Mon, Jun 2, 2014 at 11:50 AM, Pradeep Gollakota < >> pradeep...@gmail.com> >> > wrote: >> > >> > > There was a similar question as this on StackOverflow a while back. >> The >> > > suggestion was to write a custom BagToTuple UDF. >> > > >> > > >> > > >> > >> http://stackoverflow.com/questions/18544602/how-to-flatten-a-group-into-a-single-tuple-in-pig >> > > >> > > >> > > On Mon, Jun 2, 2014 at 8:46 AM, Pradeep Gollakota < >> pradeep...@gmail.com> >> > > wrote: >> > > >> > > > Disregard last email. >> > > > >> > > > Sorry... didn't fully understand the question. >> > > > >> > > > >> > > > On Mon, Jun 2, 2014 at 8:44 AM, Pradeep Gollakota < >> > pradeep...@gmail.com> >> > > > wrote: >> > > > >> > > >> FOREACH A GENERATE cust_id, cust_name, FLATTEN(cust_address), >> > > cust_email; >> > > >> >> > > >> >> > > >> >> > > >> >> > > >> On Sun, Jun 1, 2014 at 5:54 PM, Rahul Channe < >> drah...@googlemail.com> >> > > >> wrote: >> > > >> >> > > >>> Hi All, >> > > >>> >> > > >>> I have imported hive table into pig having a complex data type >> > > >>> (ARRAY). The alias in pig looks as below >> > > >>> >> > > >>> grunt> describe A; >> > > >>> A: {cust_id: int,cust_name: chararray,cust_address: {innertuple: >> > > >>> (innerfield: chararray)},cust_email: chararray} >> > > >>> >> > > >>> grunt> dump A; >> > > >>> >> > > >>> (123,phil abc,{(2200),(benjamin avenue),(philadelphia)}, >> > t...@gmail.com >> > > ) >> > > >>> (124,diego arty,{(44),(atlanta franklin),(florida)}, >> o...@gmail.com) >> > > >>> >> > > >>> The cust_address is the ARRAY field from hive. I want to FLATTEN >> the >> > > >>> cust_address into different fields. >> > > >>> >> > > >>> >> > > >>> Expected output >> > > >>> (2200,benjamin avenue,philadelphia) >> > > >>> (44,atlanta franklin,florida) >> > > >>> >> > > >>> please help >> > > >>> >> > > >>> Regards, >> > > >>> Rahul >> > > >>> >> > > >> >> > > >> >> > > > >> > > >> > >> > >
Re: How to FLATTEN hive column in Pig with ARRAY data type
grunt> B = foreach A generate BagToTuple(cust_address); grunt> describe B; B: {org.apache.pig.builtin.bagtotuple_cust_address_24: (innerfield: chararray)} grunt> dump B; ((2200,benjamin franklin,philadelphia)) ((44,atlanta franklin,florida)) On Mon, Jun 2, 2014 at 12:59 PM, Pradeep Gollakota wrote: > If you're using the built-in BagToTuple UDF, then you probably don't need > the FLATTEN operator. > > I suspect that your output looks as follows: > > 2200 > benjamin avenue > philadelphia > ... > > Can you confirm that this is what you're seeing? > > > On Mon, Jun 2, 2014 at 9:52 AM, Rahul Channe > wrote: > > > Thank You Pradeep, it worked to a certain extend but having following > > difficulty in separating fields as $0,$1 for the customer_address. > > > > > > Example - > > > > grunt> describe A; > > A: {cust_id: int,cust_name: chararray,cust_address: {innertuple: > > (innerfield: chararray)},cust_email: chararray} > > > > grunt> dump A; > > > > (123,phil abc,{(2200),(benjamin avenue),(philadelphia)},t...@gmail.com) > > (124,diego arty,{(44),(atlanta franklin),(florida)},o...@gmail.com) > > > > grunt> B = foreach A generate FLATTEN(BagToTuple(cust_address)); > > grunt> dump B; > > (2200,benjamin franklin,philadelphia) > > (44,atlanta franklin,florida) > > > > grunt> describe B; > > B: {org.apache.pig.builtin.bagtotuple_cust_address_34::innerfield: > > chararray} > > > > > > > > I am not able to seperate the fields in B as $0,$1 and $3 ,tried using > > STRSPLIT but didnt work. > > > > > > > > On Mon, Jun 2, 2014 at 11:50 AM, Pradeep Gollakota > > > wrote: > > > > > There was a similar question as this on StackOverflow a while back. The > > > suggestion was to write a custom BagToTuple UDF. > > > > > > > > > > > > http://stackoverflow.com/questions/18544602/how-to-flatten-a-group-into-a-single-tuple-in-pig > > > > > > > > > On Mon, Jun 2, 2014 at 8:46 AM, Pradeep Gollakota < > pradeep...@gmail.com> > > > wrote: > > > > > > > Disregard last email. > > > > > > > > Sorry... didn't fully understand the question. > > > > > > > > > > > > On Mon, Jun 2, 2014 at 8:44 AM, Pradeep Gollakota < > > pradeep...@gmail.com> > > > > wrote: > > > > > > > >> FOREACH A GENERATE cust_id, cust_name, FLATTEN(cust_address), > > > cust_email; > > > >> > > > >> > > > >> > > > >> > > > >> On Sun, Jun 1, 2014 at 5:54 PM, Rahul Channe < > drah...@googlemail.com> > > > >> wrote: > > > >> > > > >>> Hi All, > > > >>> > > > >>> I have imported hive table into pig having a complex data type > > > >>> (ARRAY). The alias in pig looks as below > > > >>> > > > >>> grunt> describe A; > > > >>> A: {cust_id: int,cust_name: chararray,cust_address: {innertuple: > > > >>> (innerfield: chararray)},cust_email: chararray} > > > >>> > > > >>> grunt> dump A; > > > >>> > > > >>> (123,phil abc,{(2200),(benjamin avenue),(philadelphia)}, > > t...@gmail.com > > > ) > > > >>> (124,diego arty,{(44),(atlanta franklin),(florida)},o...@gmail.com > ) > > > >>> > > > >>> The cust_address is the ARRAY field from hive. I want to FLATTEN > the > > > >>> cust_address into different fields. > > > >>> > > > >>> > > > >>> Expected output > > > >>> (2200,benjamin avenue,philadelphia) > > > >>> (44,atlanta franklin,florida) > > > >>> > > > >>> please help > > > >>> > > > >>> Regards, > > > >>> Rahul > > > >>> > > > >> > > > >> > > > > > > > > > >
Re: How to FLATTEN hive column in Pig with ARRAY data type
If you're using the built-in BagToTuple UDF, then you probably don't need the FLATTEN operator. I suspect that your output looks as follows: 2200 benjamin avenue philadelphia ... Can you confirm that this is what you're seeing? On Mon, Jun 2, 2014 at 9:52 AM, Rahul Channe wrote: > Thank You Pradeep, it worked to a certain extend but having following > difficulty in separating fields as $0,$1 for the customer_address. > > > Example - > > grunt> describe A; > A: {cust_id: int,cust_name: chararray,cust_address: {innertuple: > (innerfield: chararray)},cust_email: chararray} > > grunt> dump A; > > (123,phil abc,{(2200),(benjamin avenue),(philadelphia)},t...@gmail.com) > (124,diego arty,{(44),(atlanta franklin),(florida)},o...@gmail.com) > > grunt> B = foreach A generate FLATTEN(BagToTuple(cust_address)); > grunt> dump B; > (2200,benjamin franklin,philadelphia) > (44,atlanta franklin,florida) > > grunt> describe B; > B: {org.apache.pig.builtin.bagtotuple_cust_address_34::innerfield: > chararray} > > > > I am not able to seperate the fields in B as $0,$1 and $3 ,tried using > STRSPLIT but didnt work. > > > > On Mon, Jun 2, 2014 at 11:50 AM, Pradeep Gollakota > wrote: > > > There was a similar question as this on StackOverflow a while back. The > > suggestion was to write a custom BagToTuple UDF. > > > > > > > http://stackoverflow.com/questions/18544602/how-to-flatten-a-group-into-a-single-tuple-in-pig > > > > > > On Mon, Jun 2, 2014 at 8:46 AM, Pradeep Gollakota > > wrote: > > > > > Disregard last email. > > > > > > Sorry... didn't fully understand the question. > > > > > > > > > On Mon, Jun 2, 2014 at 8:44 AM, Pradeep Gollakota < > pradeep...@gmail.com> > > > wrote: > > > > > >> FOREACH A GENERATE cust_id, cust_name, FLATTEN(cust_address), > > cust_email; > > >> > > >> > > >> > > >> > > >> On Sun, Jun 1, 2014 at 5:54 PM, Rahul Channe > > >> wrote: > > >> > > >>> Hi All, > > >>> > > >>> I have imported hive table into pig having a complex data type > > >>> (ARRAY). The alias in pig looks as below > > >>> > > >>> grunt> describe A; > > >>> A: {cust_id: int,cust_name: chararray,cust_address: {innertuple: > > >>> (innerfield: chararray)},cust_email: chararray} > > >>> > > >>> grunt> dump A; > > >>> > > >>> (123,phil abc,{(2200),(benjamin avenue),(philadelphia)}, > t...@gmail.com > > ) > > >>> (124,diego arty,{(44),(atlanta franklin),(florida)},o...@gmail.com) > > >>> > > >>> The cust_address is the ARRAY field from hive. I want to FLATTEN the > > >>> cust_address into different fields. > > >>> > > >>> > > >>> Expected output > > >>> (2200,benjamin avenue,philadelphia) > > >>> (44,atlanta franklin,florida) > > >>> > > >>> please help > > >>> > > >>> Regards, > > >>> Rahul > > >>> > > >> > > >> > > > > > >
Re: How to FLATTEN hive column in Pig with ARRAY data type
Thank You Pradeep, it worked to a certain extend but having following difficulty in separating fields as $0,$1 for the customer_address. Example - grunt> describe A; A: {cust_id: int,cust_name: chararray,cust_address: {innertuple: (innerfield: chararray)},cust_email: chararray} grunt> dump A; (123,phil abc,{(2200),(benjamin avenue),(philadelphia)},t...@gmail.com) (124,diego arty,{(44),(atlanta franklin),(florida)},o...@gmail.com) grunt> B = foreach A generate FLATTEN(BagToTuple(cust_address)); grunt> dump B; (2200,benjamin franklin,philadelphia) (44,atlanta franklin,florida) grunt> describe B; B: {org.apache.pig.builtin.bagtotuple_cust_address_34::innerfield: chararray} I am not able to seperate the fields in B as $0,$1 and $3 ,tried using STRSPLIT but didnt work. On Mon, Jun 2, 2014 at 11:50 AM, Pradeep Gollakota wrote: > There was a similar question as this on StackOverflow a while back. The > suggestion was to write a custom BagToTuple UDF. > > > http://stackoverflow.com/questions/18544602/how-to-flatten-a-group-into-a-single-tuple-in-pig > > > On Mon, Jun 2, 2014 at 8:46 AM, Pradeep Gollakota > wrote: > > > Disregard last email. > > > > Sorry... didn't fully understand the question. > > > > > > On Mon, Jun 2, 2014 at 8:44 AM, Pradeep Gollakota > > wrote: > > > >> FOREACH A GENERATE cust_id, cust_name, FLATTEN(cust_address), > cust_email; > >> > >> > >> > >> > >> On Sun, Jun 1, 2014 at 5:54 PM, Rahul Channe > >> wrote: > >> > >>> Hi All, > >>> > >>> I have imported hive table into pig having a complex data type > >>> (ARRAY). The alias in pig looks as below > >>> > >>> grunt> describe A; > >>> A: {cust_id: int,cust_name: chararray,cust_address: {innertuple: > >>> (innerfield: chararray)},cust_email: chararray} > >>> > >>> grunt> dump A; > >>> > >>> (123,phil abc,{(2200),(benjamin avenue),(philadelphia)},t...@gmail.com > ) > >>> (124,diego arty,{(44),(atlanta franklin),(florida)},o...@gmail.com) > >>> > >>> The cust_address is the ARRAY field from hive. I want to FLATTEN the > >>> cust_address into different fields. > >>> > >>> > >>> Expected output > >>> (2200,benjamin avenue,philadelphia) > >>> (44,atlanta franklin,florida) > >>> > >>> please help > >>> > >>> Regards, > >>> Rahul > >>> > >> > >> > > >
Re: How to FLATTEN hive column in Pig with ARRAY data type
There was a similar question as this on StackOverflow a while back. The suggestion was to write a custom BagToTuple UDF. http://stackoverflow.com/questions/18544602/how-to-flatten-a-group-into-a-single-tuple-in-pig On Mon, Jun 2, 2014 at 8:46 AM, Pradeep Gollakota wrote: > Disregard last email. > > Sorry... didn't fully understand the question. > > > On Mon, Jun 2, 2014 at 8:44 AM, Pradeep Gollakota > wrote: > >> FOREACH A GENERATE cust_id, cust_name, FLATTEN(cust_address), cust_email; >> >> >> >> >> On Sun, Jun 1, 2014 at 5:54 PM, Rahul Channe >> wrote: >> >>> Hi All, >>> >>> I have imported hive table into pig having a complex data type >>> (ARRAY). The alias in pig looks as below >>> >>> grunt> describe A; >>> A: {cust_id: int,cust_name: chararray,cust_address: {innertuple: >>> (innerfield: chararray)},cust_email: chararray} >>> >>> grunt> dump A; >>> >>> (123,phil abc,{(2200),(benjamin avenue),(philadelphia)},t...@gmail.com) >>> (124,diego arty,{(44),(atlanta franklin),(florida)},o...@gmail.com) >>> >>> The cust_address is the ARRAY field from hive. I want to FLATTEN the >>> cust_address into different fields. >>> >>> >>> Expected output >>> (2200,benjamin avenue,philadelphia) >>> (44,atlanta franklin,florida) >>> >>> please help >>> >>> Regards, >>> Rahul >>> >> >> >
Re: How to FLATTEN hive column in Pig with ARRAY data type
Disregard last email. Sorry... didn't fully understand the question. On Mon, Jun 2, 2014 at 8:44 AM, Pradeep Gollakota wrote: > FOREACH A GENERATE cust_id, cust_name, FLATTEN(cust_address), cust_email; > > > > > On Sun, Jun 1, 2014 at 5:54 PM, Rahul Channe > wrote: > >> Hi All, >> >> I have imported hive table into pig having a complex data type >> (ARRAY). The alias in pig looks as below >> >> grunt> describe A; >> A: {cust_id: int,cust_name: chararray,cust_address: {innertuple: >> (innerfield: chararray)},cust_email: chararray} >> >> grunt> dump A; >> >> (123,phil abc,{(2200),(benjamin avenue),(philadelphia)},t...@gmail.com) >> (124,diego arty,{(44),(atlanta franklin),(florida)},o...@gmail.com) >> >> The cust_address is the ARRAY field from hive. I want to FLATTEN the >> cust_address into different fields. >> >> >> Expected output >> (2200,benjamin avenue,philadelphia) >> (44,atlanta franklin,florida) >> >> please help >> >> Regards, >> Rahul >> > >
Re: How to FLATTEN hive column in Pig with ARRAY data type
FOREACH A GENERATE cust_id, cust_name, FLATTEN(cust_address), cust_email; On Sun, Jun 1, 2014 at 5:54 PM, Rahul Channe wrote: > Hi All, > > I have imported hive table into pig having a complex data type > (ARRAY). The alias in pig looks as below > > grunt> describe A; > A: {cust_id: int,cust_name: chararray,cust_address: {innertuple: > (innerfield: chararray)},cust_email: chararray} > > grunt> dump A; > > (123,phil abc,{(2200),(benjamin avenue),(philadelphia)},t...@gmail.com) > (124,diego arty,{(44),(atlanta franklin),(florida)},o...@gmail.com) > > The cust_address is the ARRAY field from hive. I want to FLATTEN the > cust_address into different fields. > > > Expected output > (2200,benjamin avenue,philadelphia) > (44,atlanta franklin,florida) > > please help > > Regards, > Rahul >