On Wed, May 5, 2010 at 2:49 PM, Dmitriy Ryaboy <[email protected]> wrote:

> Under the hood Hive tables are just files too.
> I am not sure what the INSERT OVERWRITE semantics are in edge cases
> (like if your query fails), but you may be able to simulate it using
> 'fs -mv' and 'fs -rmf' commands that Pig provides to operate on the
> hadoop file system.
> Note that for safety, Pig will refuse to run if you are trying to
> write into a directory that already exists, so you *must* use a move
> or a remove if you might already have data in the target location.
>
> All of that goes out the window for both Hive and Pig if you are using
> custom SerDes/StoreFuncs, which can do more or less whatever they
> want.
>
> -D
>
> On Wed, May 5, 2010 at 11:07 AM, Thejas Nair <[email protected]> wrote:
> > Hi Syed,
> >
> > 1. Released versions of  pig don't support concept of table, there will
> be
> > one in owl specific loaders once they are available. Pig-latin output
> goes
> > into files (if store cmd is used) or STDOUT (if dump is used). The
> behavior
> > if the file already exists is determined by the StoreFunc , PigStorage
> will
> > give an error if the file already exists.
> >
> >
> > Re 2 & 3  - here is the translation to pig-latin -
> >
> > L = load 'B' as (id, dept_type, dept_id, visible_flag, org_type);
> >
> > FIL = filter L by visible_flag == 1;
> >
> > G = group FIL BY (id, dept_type);
> >
> > FE = foreach G  {
> >  DEPT_IDS = FIL.dept_id; DIST_DEPT_IDS = distinct DEPT_IDS;
> >  generate group.id, 'S' as org_type,  group.dept_type, COUNT_STAR(FIL)
> as
> > cnt, COUNT(DIST_DEPT_IDS) as cnt_distinct ;
> > }
> >
> > describe FE;
> > FE: {cnt_distinct: long,cnt: long,id: bytearray,dept_type:
> > bytearray,org_type: chararray}
> >
> > store FE into 'A'
> >
> >
> > On 5/4/10 4:36 PM, "Syed Wasti" <[email protected]> wrote:
> >
> >>
> >>
> >> Hi,
> >> I am new to Hadoop and Pig Latin Language.
> >> I am trying to convert the below Hive QL to Pig Latin. Any suggestions
> please.
> >>
> >> INSERT OVERWRITE TABLE A
> >> SELECT id, org_type, dept_type, cnt, cnt_distinct
> >> FROM (SELECT id, 'S' org_type, dept_type, COUNT(1) cnt, COUNT(DISTINCT
> >> dept_id) cnt_distinct
> >>          FROM B
> >>          WHERE visible_flag = 1
> >>          GROUP BY id, dept_type
> >>
> >> Questions:
> >> 1. Is there an option to overwrite the table ? OR what does Pig Latin
> offer ?
> >> 2. You can see in the inner Query "'S' org_type" I am creating a new
> column
> >> and inserting 'S' as the value to this. what does Pig Latin offer ?
> >> 3. Related to Q2, "COUNT(1) cnt" for every id I am incrementing the
> count
> >> based on how many dept_type and id has and generating a new column and
> >> inserting the count in there. How can I do this in pig ?
> >>
> >> Thanks for you help.
> >>
> >> Regards
> >> MD
> >>
> >>
> >> _________________________________________________________________
> >> Hotmail is redefining busy with tools for the New Busy. Get more from
> your
> >> inbox.
> >>
> http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL
> :
> >> en-US:WM_HMP:042010_2
> >
> >
>

The semantics of INSERT OVERWRITE are simple. The output of your queries are
written to a temp folder and the final step it is moved to its final
destination. So you should never end up with partial files in the final
directory.

Reply via email to