Thank you all for your expert suggestions. This was of great help. 

Regards
Syed Wasti


> Date: Wed, 5 May 2010 12:25:08 -0700
> Subject: Re: Pig Latin Questions
> From: [email protected]
> To: [email protected]
> CC: [email protected]
> 
> in that case:
> 
> store into 'tmpdir';
> exec;
> fs -rmf 'destdir'
> mv 'tmpdir' 'destdir'
> 
> -D
> 
> On Wed, May 5, 2010 at 12:13 PM, Edward Capriolo <[email protected]> 
> wrote:
> > On Wed, May 5, 2010 at 2:49 PM, Dmitriy Ryaboy <[email protected]> wrote:
> >
> >> Under the hood Hive tables are just files too.
> >> I am not sure what the INSERT OVERWRITE semantics are in edge cases
> >> (like if your query fails), but you may be able to simulate it using
> >> 'fs -mv' and 'fs -rmf' commands that Pig provides to operate on the
> >> hadoop file system.
> >> Note that for safety, Pig will refuse to run if you are trying to
> >> write into a directory that already exists, so you *must* use a move
> >> or a remove if you might already have data in the target location.
> >>
> >> All of that goes out the window for both Hive and Pig if you are using
> >> custom SerDes/StoreFuncs, which can do more or less whatever they
> >> want.
> >>
> >> -D
> >>
> >> On Wed, May 5, 2010 at 11:07 AM, Thejas Nair <[email protected]> wrote:
> >> > Hi Syed,
> >> >
> >> > 1. Released versions of  pig don't support concept of table, there will
> >> be
> >> > one in owl specific loaders once they are available. Pig-latin output
> >> goes
> >> > into files (if store cmd is used) or STDOUT (if dump is used). The
> >> behavior
> >> > if the file already exists is determined by the StoreFunc , PigStorage
> >> will
> >> > give an error if the file already exists.
> >> >
> >> >
> >> > Re 2 & 3  - here is the translation to pig-latin -
> >> >
> >> > L = load 'B' as (id, dept_type, dept_id, visible_flag, org_type);
> >> >
> >> > FIL = filter L by visible_flag == 1;
> >> >
> >> > G = group FIL BY (id, dept_type);
> >> >
> >> > FE = foreach G  {
> >> >  DEPT_IDS = FIL.dept_id; DIST_DEPT_IDS = distinct DEPT_IDS;
> >> >  generate group.id, 'S' as org_type,  group.dept_type, COUNT_STAR(FIL)
> >> as
> >> > cnt, COUNT(DIST_DEPT_IDS) as cnt_distinct ;
> >> > }
> >> >
> >> > describe FE;
> >> > FE: {cnt_distinct: long,cnt: long,id: bytearray,dept_type:
> >> > bytearray,org_type: chararray}
> >> >
> >> > store FE into 'A'
> >> >
> >> >
> >> > On 5/4/10 4:36 PM, "Syed Wasti" <[email protected]> wrote:
> >> >
> >> >>
> >> >>
> >> >> Hi,
> >> >> I am new to Hadoop and Pig Latin Language.
> >> >> I am trying to convert the below Hive QL to Pig Latin. Any suggestions
> >> please.
> >> >>
> >> >> INSERT OVERWRITE TABLE A
> >> >> SELECT id, org_type, dept_type, cnt, cnt_distinct
> >> >> FROM (SELECT id, 'S' org_type, dept_type, COUNT(1) cnt, COUNT(DISTINCT
> >> >> dept_id) cnt_distinct
> >> >>          FROM B
> >> >>          WHERE visible_flag = 1
> >> >>          GROUP BY id, dept_type
> >> >>
> >> >> Questions:
> >> >> 1. Is there an option to overwrite the table ? OR what does Pig Latin
> >> offer ?
> >> >> 2. You can see in the inner Query "'S' org_type" I am creating a new
> >> column
> >> >> and inserting 'S' as the value to this. what does Pig Latin offer ?
> >> >> 3. Related to Q2, "COUNT(1) cnt" for every id I am incrementing the
> >> count
> >> >> based on how many dept_type and id has and generating a new column and
> >> >> inserting the count in there. How can I do this in pig ?
> >> >>
> >> >> Thanks for you help.
> >> >>
> >> >> Regards
> >> >> MD
> >> >>
> >> >>
> >> >> _________________________________________________________________
> >> >> Hotmail is redefining busy with tools for the New Busy. Get more from
> >> your
> >> >> inbox.
> >> >>
> >> http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL
> >> :
> >> >> en-US:WM_HMP:042010_2
> >> >
> >> >
> >>
> >
> > The semantics of INSERT OVERWRITE are simple. The output of your queries are
> > written to a temp folder and the final step it is moved to its final
> > destination. So you should never end up with partial files in the final
> > directory.
> >
                                          
_________________________________________________________________
The New Busy is not the old busy. Search, chat and e-mail from your inbox.
http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_3

Reply via email to