Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change 
notification.

The following page has been changed by ChrisOlston:
http://wiki.apache.org/pig/PigOverview

------------------------------------------------------------------------------
  
  Pig has two parts:
   * A language for processing data, called ''Pig Latin''.
-  * A set of ''evaluation mechanisms'' for evaluating a Pig Latin program. 
Current evaluation mechanisms include (a) local evaluation in a single JVM, (b) 
evaluation by translation into one or more Map-Reduce jobs, executed using 
[lucene.apache.org/hadoop Hadoop].
+  * A set of ''evaluation mechanisms'' for evaluating a Pig Latin program. 
Current evaluation mechanisms include (a) local evaluation in a single JVM, (b) 
evaluation by translation into one or more Map-Reduce jobs, executed using 
[http://lucene.apache.org/hadoop Hadoop].
  
  == Pig Latin programs: ==
  
@@ -23, +23 @@

   * Script file
   * Embed in a host language; currently we support Java as the host language 
(embedding Pig Latin in Java is very similar to JDBC)
  
- == Data formats: ==
+ == Data formats and models: ==
  
   * Pig can process data of any format. (Pigs eat anything! .. or is that 
goats?) A few common formats such as tab delimited text files, are supported 
via built-in capabilities. A user can add support for a file format by writing 
a function that parses the bytes of a file into objects in Pig's data model, 
and vice versa.
-  * Pig's data model is similar to the relational data model, except that 
tuples (a.k.a. records or rows) can be nested. For example, you can have a 
table of tuples, where the third field of each tuple contains a table. In Pig, 
tables are called bags.
+  * Pig's data model is similar to the relational data model, except that 
tuples (a.k.a. records or rows) can be nested. For example, you can have a 
table of tuples, where the third field of each tuple contains a table. In Pig, 
tables are called bags. Pig also has a "map" data type, which is useful in 
representing semistructured data, e.g. JSON or XML.
  
  == Other capabilities: ==
  

Reply via email to