Dear Wiki user, You have subscribed to a wiki page or wiki category on "Pig Wiki" for change notification.
The "NativeMapReduce" page has been changed by Aniket Mokashi. http://wiki.apache.org/pig/NativeMapReduce -------------------------------------------------- New page: #format wiki #language en <<Navigation(children)>> <<TableOfContents>> This document captures the specification for native map reduce jobs and proposal for executing native mapreduce jobs inside pig script. This is tracked at *https://issues.apache.org/jira/browse/PIG-506. == Introduction == Pig needs to provide a way to natively run map reduce jobs written in java language. There are some advantages of this- 1. The advantages of the ''native'' keyword are that the user need not be worried about coordination between the jobs, pig will take care of it. 2. User can make use of existing java applications without being a java programmer. == Syntax == To support native mapreduce job pig will support following syntax- {{{ X = ... ; Y = NATIVE ('mymr.jar' [, 'other.jar' ...]) STORE X INTO 'storeLocation' USING storeFunc LOAD 'loadLocation' USING loadFunc [params, ... ]; }}} This stores '''X''' into the '''storeLocation''' which is used by native mapreduce to read its data. After we run mymr.jar's mapreduce we load back the data from '''loadLocation''' into alias '''Y'''. == Comparison with similar features == === Pig Streaming === === Hive Transform === == Native Mapreduce job specification == Native Mapreduce job needs to conform to some specification defined by Pig. Pig specifies the input and output directory for this job and is responsible for == Implementation Details == == References == 1. <<Anchor(ref1)>> PIG-506, "Does pig need a NATIVE keyword?", https://issues.apache.org/jira/browse/PIG-506 2. <<Anchor(ref2)>> Pig Wiki, "Pig Streaming Functional Specification", http://wiki.apache.org/pig/PigStreamingFunctionalSpec 3. <<Anchor(ref3)>> Hive Wiki, "Transform/Map-Reduce Syntax", http://wiki.apache.org/hadoop/Hive/LanguageManual/Transform