Dear Wiki user,
You have subscribed to a wiki page or wiki category on "Pig Wiki" for change
The following page has been changed by ChrisOlston:
---+++ What is Pig:
* Pig has two parts:
* A language for processing data, called <i>Pig Latin</i>.
* A set of <i>evaluation mechanisms</i> for evaluating a Pig Latin program.
Current evaluation mechanisms include (a) local evaluation in a single JVM, (2)
evaluation by translation into one or more Map-Reduce jobs, executed using
---+++ Pig Latin programs:
* Pig Latin has built-in relational-style operations such as filter, project,
group, join. Pig Latin also has a map operation that applies a custom user
function to every member of a set. In Pig Latin, the map operation is called
* Additionally, users can incorporate their own custom code into essentially
any Pig Latin operation. For example, if a user has a function that determines
whether a given image contains a human face, the user can ask Pig to filter
images according to this function. Pig will then evaluate this function on the
user's behalf, over the images. If the evaluation mechanism incorporates
parallelism, as is the case with the Hadoop evaluation mechanism, then the
user's function will be executed in a parallel fashion.
* Pig can process data of any format. Some standard formats, e.g. tab
delimited text files, are supported via built-in capabilities. A user can add
support for a file format by writing a function that parses the bytes of a file
into objects in Pig's data model, and vice versa.
* Pig's data model is similar to the relational data model, except that tuples
can be nested. For example, you can have a table of tuples, where the third
field of each tuple contains a table. In Pig, tables are called bags.