Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change 
notification.

The following page has been changed by OlgaN:
http://wiki.apache.org/pig/PigBuiltins

New page:
[[Anchor(Built-in_functions)]]
== Built-in functions ==

We have a modest library of built-in functions. Feel free to contribute your 
own.

[[Anchor(Storage_Functions)]]
=== Storage Functions ===
These functions are used for loading/storing data.

   * '''!PigStorage''' - for loading/storing text files with delimited records. 
Note that !PigStorage can only store flat tuples, i.e., tuples having atomic 
fields. If you want to store nested data, use !BinStorage instead.
   * '''!BinStorage''' - !BinStorage can store arbitrarily nested data. It can 
also be used for loading intermediate results that were previously stored using 
it.
   * '''!TextLoader''' - for loading unstructured text files. Each line is 
loaded as a tuple with a single field which is the entire line.  It cannot be 
used for storing data.
   * '''!PigDump''' - for storing arbitrarily nested data in human-readable 
format.

[[Anchor(Filter_Functions)]]
=== Filter Functions ===
   * '''!IsEmpty''' - tests whether a bag is empty

[[Anchor(Eval_Functions)]]
=== Eval Functions ===
   * '''COUNT''' - computes the number of elements in a bag (also known as the 
"cardinality" of a bag)
   * '''SUM''' - computes the sum of the numeric values in a single-column bag
   * '''AVG''' - computes the average of the numeric values in a single-column 
bag
   * '''MIN/MAX''' - computes the min/max of the numeric values in a 
single-column bag.
   * '''ARITY''' - computes the number of fields in a tuple (also known as the 
"arity" of a tuple)
   * '''TOKENIZE''' - splits a string and outputs a bag of words
   * '''DIFF''' - Compares the fields of a tuple with arity 2. If the fields 
are !DataBags, it will emit any Tuples that are in on of the !DataBags but not 
the other. If the fields are values, it will emit tuples with values that do 
not match.

[[Anchor(Group_Functions)]]
=== Group Functions ===

There are as yet no built-in group functions because usually users just want to 
group by the values of fields. If you want all tuples to go in the same group, 
you can use `GROUP <alias> ALL`. Similarly, you can say `GROUP <alias> ANY` if 
you don't care about how tuples are grouped. See PigLatin.

Reply via email to