Thank you all for your expert suggestions. This was of great help. Regards Syed Wasti
> Date: Wed, 5 May 2010 12:25:08 -0700 > Subject: Re: Pig Latin Questions > From: [email protected] > To: [email protected] > CC: [email protected] > > in that case: > > store into 'tmpdir'; > exec; > fs -rmf 'destdir' > mv 'tmpdir' 'destdir' > > -D > > On Wed, May 5, 2010 at 12:13 PM, Edward Capriolo <[email protected]> > wrote: > > On Wed, May 5, 2010 at 2:49 PM, Dmitriy Ryaboy <[email protected]> wrote: > > > >> Under the hood Hive tables are just files too. > >> I am not sure what the INSERT OVERWRITE semantics are in edge cases > >> (like if your query fails), but you may be able to simulate it using > >> 'fs -mv' and 'fs -rmf' commands that Pig provides to operate on the > >> hadoop file system. > >> Note that for safety, Pig will refuse to run if you are trying to > >> write into a directory that already exists, so you *must* use a move > >> or a remove if you might already have data in the target location. > >> > >> All of that goes out the window for both Hive and Pig if you are using > >> custom SerDes/StoreFuncs, which can do more or less whatever they > >> want. > >> > >> -D > >> > >> On Wed, May 5, 2010 at 11:07 AM, Thejas Nair <[email protected]> wrote: > >> > Hi Syed, > >> > > >> > 1. Released versions of pig don't support concept of table, there will > >> be > >> > one in owl specific loaders once they are available. Pig-latin output > >> goes > >> > into files (if store cmd is used) or STDOUT (if dump is used). The > >> behavior > >> > if the file already exists is determined by the StoreFunc , PigStorage > >> will > >> > give an error if the file already exists. > >> > > >> > > >> > Re 2 & 3 - here is the translation to pig-latin - > >> > > >> > L = load 'B' as (id, dept_type, dept_id, visible_flag, org_type); > >> > > >> > FIL = filter L by visible_flag == 1; > >> > > >> > G = group FIL BY (id, dept_type); > >> > > >> > FE = foreach G { > >> > DEPT_IDS = FIL.dept_id; DIST_DEPT_IDS = distinct DEPT_IDS; > >> > generate group.id, 'S' as org_type, group.dept_type, COUNT_STAR(FIL) > >> as > >> > cnt, COUNT(DIST_DEPT_IDS) as cnt_distinct ; > >> > } > >> > > >> > describe FE; > >> > FE: {cnt_distinct: long,cnt: long,id: bytearray,dept_type: > >> > bytearray,org_type: chararray} > >> > > >> > store FE into 'A' > >> > > >> > > >> > On 5/4/10 4:36 PM, "Syed Wasti" <[email protected]> wrote: > >> > > >> >> > >> >> > >> >> Hi, > >> >> I am new to Hadoop and Pig Latin Language. > >> >> I am trying to convert the below Hive QL to Pig Latin. Any suggestions > >> please. > >> >> > >> >> INSERT OVERWRITE TABLE A > >> >> SELECT id, org_type, dept_type, cnt, cnt_distinct > >> >> FROM (SELECT id, 'S' org_type, dept_type, COUNT(1) cnt, COUNT(DISTINCT > >> >> dept_id) cnt_distinct > >> >> FROM B > >> >> WHERE visible_flag = 1 > >> >> GROUP BY id, dept_type > >> >> > >> >> Questions: > >> >> 1. Is there an option to overwrite the table ? OR what does Pig Latin > >> offer ? > >> >> 2. You can see in the inner Query "'S' org_type" I am creating a new > >> column > >> >> and inserting 'S' as the value to this. what does Pig Latin offer ? > >> >> 3. Related to Q2, "COUNT(1) cnt" for every id I am incrementing the > >> count > >> >> based on how many dept_type and id has and generating a new column and > >> >> inserting the count in there. How can I do this in pig ? > >> >> > >> >> Thanks for you help. > >> >> > >> >> Regards > >> >> MD > >> >> > >> >> > >> >> _________________________________________________________________ > >> >> Hotmail is redefining busy with tools for the New Busy. Get more from > >> your > >> >> inbox. > >> >> > >> http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL > >> : > >> >> en-US:WM_HMP:042010_2 > >> > > >> > > >> > > > > The semantics of INSERT OVERWRITE are simple. The output of your queries are > > written to a temp folder and the final step it is moved to its final > > destination. So you should never end up with partial files in the final > > directory. > > _________________________________________________________________ The New Busy is not the old busy. Search, chat and e-mail from your inbox. http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_3
