Dear Wiki user, You have subscribed to a wiki page or wiki category on "Pig Wiki" for change notification.
The following page has been changed by UtkarshSrivastava: http://wiki.apache.org/pig/PigFunctions ------------------------------------------------------------------------------ * ''"Reduce" behavior:'' Recall that in the Pig data model, a tuple may contain fields of type ''bag''. Hence an Eval Function may perform aggregation or "reducing" by iterating over a bag of tuples nested within the input tuple. This is how the built-in aggregation function SUM(...) works, for example. The other types of functions are: - * '''Filter Function:''' evalutes to True or False when given a tuple; used to eliminate unwanted tuples from a relation or bag - * '''Group Function:''' assigns tuples to group(s) * '''Load Function:''' controls reading of tuples from files * '''Store Function:''' controls storing of tuples to files [[Anchor(Example)]] ==== Example ==== - The following example uses each of the five types of functions. It computes the set of unique IP addresses associated with "good" products drawn from a list of products found on the web. + The following example uses each of the types of functions. It computes the set of unique IP addresses associated with "good" products drawn from a list of products found on the web. {{{ register myFunctions.jar products = LOAD '/productlist.txt' USING MyListStorage() AS (name, price, description, url); - goodProducts = FILTER products BY (price <= '19.99' AND MyFilter(description)); + goodProducts = FILTER products BY (price <= '19.99'); hostnames = FOREACH goodProducts GENERATE MyHostExtractor(url) AS hostname; uniqueIPs = FOREACH (GROUP hostnames BY MyIPLookup(hostname)) GENERATE group AS ipAddress; STORE uniqueIPs INTO '/iplist.txt' USING MyListStorage(); }}} - In the above example, !MyListStorage() serves as a load function as well as a store function; !MyFilter() is a filter function; !MyHostExtractor() is an eval function; MyIPLookup() is a group function. `myFunctions.jar` is a jar file that contains the classes for the user-defined functions. + In the above example, !MyListStorage() serves as a load function as well as a store function; !MyHostExtractor() and !MyIPLookup() are eval functions. `myFunctions.jar` is a jar file that contains the classes for the user-defined functions. [[Anchor(How_to_write_functions)]] === How to write functions === @@ -41, +39 @@ Click below to learn how to build your own: * EvalFunction - * FilterFunction - * GroupFunction - * StorageFunction (These are the most difficult to write, and usually, the inbuilt ones should be enough) + * Load/Store Function (These are the most difficult to write, and usually, the inbuilt ones should be enough) [[Anchor(Ok,_I_have_written_my_function,_how_to_use_it?)]] === Ok, I have written my function, how to use it? ===