Hi Syed,
1. Released versions of pig don't support concept of table, there will be
one in owl specific loaders once they are available. Pig-latin output goes
into files (if store cmd is used) or STDOUT (if dump is used). The behavior
if the file already exists is determined by the StoreFunc , PigStorage will
give an error if the file already exists.
Re 2 & 3 - here is the translation to pig-latin -
L = load 'B' as (id, dept_type, dept_id, visible_flag, org_type);
FIL = filter L by visible_flag == 1;
G = group FIL BY (id, dept_type);
FE = foreach G {
DEPT_IDS = FIL.dept_id; DIST_DEPT_IDS = distinct DEPT_IDS;
generate group.id, 'S' as org_type, group.dept_type, COUNT_STAR(FIL) as
cnt, COUNT(DIST_DEPT_IDS) as cnt_distinct ;
}
describe FE;
FE: {cnt_distinct: long,cnt: long,id: bytearray,dept_type:
bytearray,org_type: chararray}
store FE into 'A'
On 5/4/10 4:36 PM, "Syed Wasti" <[email protected]> wrote:
>
>
> Hi,
> I am new to Hadoop and Pig Latin Language.
> I am trying to convert the below Hive QL to Pig Latin. Any suggestions please.
>
> INSERT OVERWRITE TABLE A
> SELECT id, org_type, dept_type, cnt, cnt_distinct
> FROM (SELECT id, 'S' org_type, dept_type, COUNT(1) cnt, COUNT(DISTINCT
> dept_id) cnt_distinct
> FROM B
> WHERE visible_flag = 1
> GROUP BY id, dept_type
>
> Questions:
> 1. Is there an option to overwrite the table ? OR what does Pig Latin offer ?
> 2. You can see in the inner Query "'S' org_type" I am creating a new column
> and inserting 'S' as the value to this. what does Pig Latin offer ?
> 3. Related to Q2, "COUNT(1) cnt" for every id I am incrementing the count
> based on how many dept_type and id has and generating a new column and
> inserting the count in there. How can I do this in pig ?
>
> Thanks for you help.
>
> Regards
> MD
>
>
> _________________________________________________________________
> Hotmail is redefining busy with tools for the New Busy. Get more from your
> inbox.
> http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:
> en-US:WM_HMP:042010_2