Re: Pig Latin Questions

Dmitriy Ryaboy Wed, 05 May 2010 12:25:36 -0700

in that case:

store into 'tmpdir';
exec;
fs -rmf 'destdir'
mv 'tmpdir' 'destdir'


-D

On Wed, May 5, 2010 at 12:13 PM, Edward Capriolo <[email protected]> wrote:
> On Wed, May 5, 2010 at 2:49 PM, Dmitriy Ryaboy <[email protected]> wrote:
>
>> Under the hood Hive tables are just files too.
>> I am not sure what the INSERT OVERWRITE semantics are in edge cases
>> (like if your query fails), but you may be able to simulate it using
>> 'fs -mv' and 'fs -rmf' commands that Pig provides to operate on the
>> hadoop file system.
>> Note that for safety, Pig will refuse to run if you are trying to
>> write into a directory that already exists, so you *must* use a move
>> or a remove if you might already have data in the target location.
>>
>> All of that goes out the window for both Hive and Pig if you are using
>> custom SerDes/StoreFuncs, which can do more or less whatever they
>> want.
>>
>> -D
>>
>> On Wed, May 5, 2010 at 11:07 AM, Thejas Nair <[email protected]> wrote:
>> > Hi Syed,
>> >
>> > 1. Released versions of  pig don't support concept of table, there will
>> be
>> > one in owl specific loaders once they are available. Pig-latin output
>> goes
>> > into files (if store cmd is used) or STDOUT (if dump is used). The
>> behavior
>> > if the file already exists is determined by the StoreFunc , PigStorage
>> will
>> > give an error if the file already exists.
>> >
>> >
>> > Re 2 & 3  - here is the translation to pig-latin -
>> >
>> > L = load 'B' as (id, dept_type, dept_id, visible_flag, org_type);
>> >
>> > FIL = filter L by visible_flag == 1;
>> >
>> > G = group FIL BY (id, dept_type);
>> >
>> > FE = foreach G  {
>> >  DEPT_IDS = FIL.dept_id; DIST_DEPT_IDS = distinct DEPT_IDS;
>> >  generate group.id, 'S' as org_type,  group.dept_type, COUNT_STAR(FIL)
>> as
>> > cnt, COUNT(DIST_DEPT_IDS) as cnt_distinct ;
>> > }
>> >
>> > describe FE;
>> > FE: {cnt_distinct: long,cnt: long,id: bytearray,dept_type:
>> > bytearray,org_type: chararray}
>> >
>> > store FE into 'A'
>> >
>> >
>> > On 5/4/10 4:36 PM, "Syed Wasti" <[email protected]> wrote:
>> >
>> >>
>> >>
>> >> Hi,
>> >> I am new to Hadoop and Pig Latin Language.
>> >> I am trying to convert the below Hive QL to Pig Latin. Any suggestions
>> please.
>> >>
>> >> INSERT OVERWRITE TABLE A
>> >> SELECT id, org_type, dept_type, cnt, cnt_distinct
>> >> FROM (SELECT id, 'S' org_type, dept_type, COUNT(1) cnt, COUNT(DISTINCT
>> >> dept_id) cnt_distinct
>> >>          FROM B
>> >>          WHERE visible_flag = 1
>> >>          GROUP BY id, dept_type
>> >>
>> >> Questions:
>> >> 1. Is there an option to overwrite the table ? OR what does Pig Latin
>> offer ?
>> >> 2. You can see in the inner Query "'S' org_type" I am creating a new
>> column
>> >> and inserting 'S' as the value to this. what does Pig Latin offer ?
>> >> 3. Related to Q2, "COUNT(1) cnt" for every id I am incrementing the
>> count
>> >> based on how many dept_type and id has and generating a new column and
>> >> inserting the count in there. How can I do this in pig ?
>> >>
>> >> Thanks for you help.
>> >>
>> >> Regards
>> >> MD
>> >>
>> >>
>> >> _________________________________________________________________
>> >> Hotmail is redefining busy with tools for the New Busy. Get more from
>> your
>> >> inbox.
>> >>
>> http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL
>> :
>> >> en-US:WM_HMP:042010_2
>> >
>> >
>>
>
> The semantics of INSERT OVERWRITE are simple. The output of your queries are
> written to a temp folder and the final step it is moved to its final
> destination. So you should never end up with partial files in the final
> directory.
>

Re: Pig Latin Questions

Reply via email to