On Wed, May 5, 2010 at 2:49 PM, Dmitriy Ryaboy <[email protected]> wrote:
> Under the hood Hive tables are just files too. > I am not sure what the INSERT OVERWRITE semantics are in edge cases > (like if your query fails), but you may be able to simulate it using > 'fs -mv' and 'fs -rmf' commands that Pig provides to operate on the > hadoop file system. > Note that for safety, Pig will refuse to run if you are trying to > write into a directory that already exists, so you *must* use a move > or a remove if you might already have data in the target location. > > All of that goes out the window for both Hive and Pig if you are using > custom SerDes/StoreFuncs, which can do more or less whatever they > want. > > -D > > On Wed, May 5, 2010 at 11:07 AM, Thejas Nair <[email protected]> wrote: > > Hi Syed, > > > > 1. Released versions of pig don't support concept of table, there will > be > > one in owl specific loaders once they are available. Pig-latin output > goes > > into files (if store cmd is used) or STDOUT (if dump is used). The > behavior > > if the file already exists is determined by the StoreFunc , PigStorage > will > > give an error if the file already exists. > > > > > > Re 2 & 3 - here is the translation to pig-latin - > > > > L = load 'B' as (id, dept_type, dept_id, visible_flag, org_type); > > > > FIL = filter L by visible_flag == 1; > > > > G = group FIL BY (id, dept_type); > > > > FE = foreach G { > > DEPT_IDS = FIL.dept_id; DIST_DEPT_IDS = distinct DEPT_IDS; > > generate group.id, 'S' as org_type, group.dept_type, COUNT_STAR(FIL) > as > > cnt, COUNT(DIST_DEPT_IDS) as cnt_distinct ; > > } > > > > describe FE; > > FE: {cnt_distinct: long,cnt: long,id: bytearray,dept_type: > > bytearray,org_type: chararray} > > > > store FE into 'A' > > > > > > On 5/4/10 4:36 PM, "Syed Wasti" <[email protected]> wrote: > > > >> > >> > >> Hi, > >> I am new to Hadoop and Pig Latin Language. > >> I am trying to convert the below Hive QL to Pig Latin. Any suggestions > please. > >> > >> INSERT OVERWRITE TABLE A > >> SELECT id, org_type, dept_type, cnt, cnt_distinct > >> FROM (SELECT id, 'S' org_type, dept_type, COUNT(1) cnt, COUNT(DISTINCT > >> dept_id) cnt_distinct > >> FROM B > >> WHERE visible_flag = 1 > >> GROUP BY id, dept_type > >> > >> Questions: > >> 1. Is there an option to overwrite the table ? OR what does Pig Latin > offer ? > >> 2. You can see in the inner Query "'S' org_type" I am creating a new > column > >> and inserting 'S' as the value to this. what does Pig Latin offer ? > >> 3. Related to Q2, "COUNT(1) cnt" for every id I am incrementing the > count > >> based on how many dept_type and id has and generating a new column and > >> inserting the count in there. How can I do this in pig ? > >> > >> Thanks for you help. > >> > >> Regards > >> MD > >> > >> > >> _________________________________________________________________ > >> Hotmail is redefining busy with tools for the New Busy. Get more from > your > >> inbox. > >> > http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL > : > >> en-US:WM_HMP:042010_2 > > > > > The semantics of INSERT OVERWRITE are simple. The output of your queries are written to a temp folder and the final step it is moved to its final destination. So you should never end up with partial files in the final directory.
