in that case: store into 'tmpdir'; exec; fs -rmf 'destdir' mv 'tmpdir' 'destdir'
-D On Wed, May 5, 2010 at 12:13 PM, Edward Capriolo <[email protected]> wrote: > On Wed, May 5, 2010 at 2:49 PM, Dmitriy Ryaboy <[email protected]> wrote: > >> Under the hood Hive tables are just files too. >> I am not sure what the INSERT OVERWRITE semantics are in edge cases >> (like if your query fails), but you may be able to simulate it using >> 'fs -mv' and 'fs -rmf' commands that Pig provides to operate on the >> hadoop file system. >> Note that for safety, Pig will refuse to run if you are trying to >> write into a directory that already exists, so you *must* use a move >> or a remove if you might already have data in the target location. >> >> All of that goes out the window for both Hive and Pig if you are using >> custom SerDes/StoreFuncs, which can do more or less whatever they >> want. >> >> -D >> >> On Wed, May 5, 2010 at 11:07 AM, Thejas Nair <[email protected]> wrote: >> > Hi Syed, >> > >> > 1. Released versions of pig don't support concept of table, there will >> be >> > one in owl specific loaders once they are available. Pig-latin output >> goes >> > into files (if store cmd is used) or STDOUT (if dump is used). The >> behavior >> > if the file already exists is determined by the StoreFunc , PigStorage >> will >> > give an error if the file already exists. >> > >> > >> > Re 2 & 3 - here is the translation to pig-latin - >> > >> > L = load 'B' as (id, dept_type, dept_id, visible_flag, org_type); >> > >> > FIL = filter L by visible_flag == 1; >> > >> > G = group FIL BY (id, dept_type); >> > >> > FE = foreach G { >> > DEPT_IDS = FIL.dept_id; DIST_DEPT_IDS = distinct DEPT_IDS; >> > generate group.id, 'S' as org_type, group.dept_type, COUNT_STAR(FIL) >> as >> > cnt, COUNT(DIST_DEPT_IDS) as cnt_distinct ; >> > } >> > >> > describe FE; >> > FE: {cnt_distinct: long,cnt: long,id: bytearray,dept_type: >> > bytearray,org_type: chararray} >> > >> > store FE into 'A' >> > >> > >> > On 5/4/10 4:36 PM, "Syed Wasti" <[email protected]> wrote: >> > >> >> >> >> >> >> Hi, >> >> I am new to Hadoop and Pig Latin Language. >> >> I am trying to convert the below Hive QL to Pig Latin. Any suggestions >> please. >> >> >> >> INSERT OVERWRITE TABLE A >> >> SELECT id, org_type, dept_type, cnt, cnt_distinct >> >> FROM (SELECT id, 'S' org_type, dept_type, COUNT(1) cnt, COUNT(DISTINCT >> >> dept_id) cnt_distinct >> >> FROM B >> >> WHERE visible_flag = 1 >> >> GROUP BY id, dept_type >> >> >> >> Questions: >> >> 1. Is there an option to overwrite the table ? OR what does Pig Latin >> offer ? >> >> 2. You can see in the inner Query "'S' org_type" I am creating a new >> column >> >> and inserting 'S' as the value to this. what does Pig Latin offer ? >> >> 3. Related to Q2, "COUNT(1) cnt" for every id I am incrementing the >> count >> >> based on how many dept_type and id has and generating a new column and >> >> inserting the count in there. How can I do this in pig ? >> >> >> >> Thanks for you help. >> >> >> >> Regards >> >> MD >> >> >> >> >> >> _________________________________________________________________ >> >> Hotmail is redefining busy with tools for the New Busy. Get more from >> your >> >> inbox. >> >> >> http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL >> : >> >> en-US:WM_HMP:042010_2 >> > >> > >> > > The semantics of INSERT OVERWRITE are simple. The output of your queries are > written to a temp folder and the final step it is moved to its final > destination. So you should never end up with partial files in the final > directory. >
