An aggregate function is an eval function that takes a bag and returns a
scalar value. One interesting and useful property of many aggregate functions
is that they can be computed incrementally in a distributed fashion. We call
these functions `algebraic`. `COUNT` is an example of an algebraic function
because we can count the number of elements in a subset of the data and then
sum the counts to produce a final output. In the Hadoop world, this means that
the partial computations can be done by the map and combiner, and the final
result can be computed by the reducer.

+ It is very important for performance to make sure that aggregate functions
that are algebraic are implemented as such. Let's look at the implementation of
the COUNT function to see what this means. (Error handling and some other code
is omitted to save space. The full code can be accessed
here].)

{{{#!java
public class COUNT extends EvalFunc<Long> implements Algebraic{
@@ -231, +231 @@

|| bag || !DataBag ||
|| map || Map<Object, Object> ||

+ All Pig-specific classes are available
here]

`Tuple` and `DataBag` are different in that they are not concrete classes but
rather interfaces. This enables users to extend Pig with their own versions of
tuples and bags. As a result, UDFs cannot directly instantiate bags or tuples;
they need to go through factory classes: `TupleFactory` and `BagFactory`.

@@ -607, +607 @@

abbreviated version is shown below. The full definition can be seen
here].

{{{#!java
@@ -641, +641 @@

In this query, only `age` needs to be converted to its actual type (=int=)
right away. `name` only needs to be converted in the next step of processing
where the data is likely to be much smaller. `gpa` is not used at all and will
never need to be converted.

+ This is the main reason for Pig to separate the reading of the data (which
can happen immediately) from the converting of the data (to the right type,
which can happen later). For ASCII data, Pig provides `Utf8StorageConverter`
that your loader class can extend and will take care of all the conversion
routines. The code for it can be found