Under the hood Hive tables are just files too. I am not sure what the INSERT OVERWRITE semantics are in edge cases (like if your query fails), but you may be able to simulate it using 'fs -mv' and 'fs -rmf' commands that Pig provides to operate on the hadoop file system. Note that for safety, Pig will refuse to run if you are trying to write into a directory that already exists, so you *must* use a move or a remove if you might already have data in the target location.
All of that goes out the window for both Hive and Pig if you are using custom SerDes/StoreFuncs, which can do more or less whatever they want. -D On Wed, May 5, 2010 at 11:07 AM, Thejas Nair <[email protected]> wrote: > Hi Syed, > > 1. Released versions of pig don't support concept of table, there will be > one in owl specific loaders once they are available. Pig-latin output goes > into files (if store cmd is used) or STDOUT (if dump is used). The behavior > if the file already exists is determined by the StoreFunc , PigStorage will > give an error if the file already exists. > > > Re 2 & 3 - here is the translation to pig-latin - > > L = load 'B' as (id, dept_type, dept_id, visible_flag, org_type); > > FIL = filter L by visible_flag == 1; > > G = group FIL BY (id, dept_type); > > FE = foreach G { > DEPT_IDS = FIL.dept_id; DIST_DEPT_IDS = distinct DEPT_IDS; > generate group.id, 'S' as org_type, group.dept_type, COUNT_STAR(FIL) as > cnt, COUNT(DIST_DEPT_IDS) as cnt_distinct ; > } > > describe FE; > FE: {cnt_distinct: long,cnt: long,id: bytearray,dept_type: > bytearray,org_type: chararray} > > store FE into 'A' > > > On 5/4/10 4:36 PM, "Syed Wasti" <[email protected]> wrote: > >> >> >> Hi, >> I am new to Hadoop and Pig Latin Language. >> I am trying to convert the below Hive QL to Pig Latin. Any suggestions >> please. >> >> INSERT OVERWRITE TABLE A >> SELECT id, org_type, dept_type, cnt, cnt_distinct >> FROM (SELECT id, 'S' org_type, dept_type, COUNT(1) cnt, COUNT(DISTINCT >> dept_id) cnt_distinct >> FROM B >> WHERE visible_flag = 1 >> GROUP BY id, dept_type >> >> Questions: >> 1. Is there an option to overwrite the table ? OR what does Pig Latin offer ? >> 2. You can see in the inner Query "'S' org_type" I am creating a new column >> and inserting 'S' as the value to this. what does Pig Latin offer ? >> 3. Related to Q2, "COUNT(1) cnt" for every id I am incrementing the count >> based on how many dept_type and id has and generating a new column and >> inserting the count in there. How can I do this in pig ? >> >> Thanks for you help. >> >> Regards >> MD >> >> >> _________________________________________________________________ >> Hotmail is redefining busy with tools for the New Busy. Get more from your >> inbox. >> http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL: >> en-US:WM_HMP:042010_2 > >
