Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change 
notification.

The following page has been changed by OlgaN:
http://wiki.apache.org/pig/PigLatin

------------------------------------------------------------------------------
  = Introduction to Pig Latin =
+ 
+ [[TableOfContents]]
  
  So you want to learn Pig Latin. Welcome! Lets begin with the data types.
  
@@ -13, +15 @@

   * A '''Data Bag''' is a set of tuples (duplicate tuples are allowed). You 
may think of it as a "table", except that Pig does not require that the tuple 
field  types match, or even that the tuples have the same number of fields! (It 
is up to you whether you want these properties.) We denote bags by { } 
bracketing. Thus, a data bag could be {<apache.org,1.0>, <flickr.com,0.8>}
   * A '''Data Map''' is a map from keys that are string literals to values 
that can be any data type. Think of it as a !HashMap<String,X> where X can be 
any of the 4 pig data types. A Data Map supports the expected get and put 
interface. We denote maps by [ ] bracketing, with ":" separating the key and 
the value, and ";" separating successive key value pairs. Thus. a data map 
could be [ 'apache' : <'pig', 'hadoop'> ; 'cnn' : 'news' ]. Here, the key 
'apache' is mapped to the tuple with 2 atomic fields 'pig' and 'hadoop', while 
the key 'cnn' is mapped to the data atom 'news'.
  
- #DataItems
  == Data Items ==
  Data can be referred to in various powerful and convenient ways in Pig. Any 
data referred to is called a Data Item. We will illustrate all these ways by 
using the following example tuple.
  
@@ -28, +29 @@

  || Field referred to by position || $0 || Data Atom '1' || In Pig, positions 
start at 0 and not 1 ||
  || Field referred to by name || f2 || Bag {<2,3>,<4,6>,<5,7>} || ||
  || Projection of another data item || f2.$0 || Bag {<2>,<4>,<5>} - the bag f2 
projected to the first field || ||
- || Map Lookup against another data item || f3#'apache' || Data Atom 'pig' || 
User's responsibility to ensure that a lookup is written only against a  data 
map, otherwise a runtime error is thrown. If the key being looked up does not 
exist, a Data Atom with an empty string is returned ||
+ || Map Lookup against another data item || f3#'apache' || Data Atom 'pig' || 
* User's responsibility to ensure that a lookup is written only against a  data 
map, otherwise a runtime error is thrown. [[BR]] * If the key being looked up 
does not exist, a Data Atom with an empty string is returned ||
  || Function applied to another data item || SUM(f2.$0) || 2+4+5 = 11 || SUM 
is a builtin Pig function. See PigFunctions for how to write your own functions 
||
  || Infix Expression of other data items || COUNT(f2) + f1 / '2.0' || 3 + 1 / 
2.0 = 3.5 || ||
  || Bincond, i.e., the value of the data item is chosen according to some 
condition ||(f1 = =  '1' ? '2' : COUNT(f2))|| '2' since f1=='1' is true. If f1 
were != '1', then the value of this data item for t would be COUNT(f2)=3 || See 
[#CondS Conditions] for what the format of the condition in the bincond can be 
||
@@ -43, +44 @@

  
  `grunt> A = load 'data' using PigStorage() as (x, y, z);`
  `grunt>B = group A by x;`
- `grunt> C = foreach B {`
+ `grunt> C = foreach B {`[[BR]]
- 
- `D = distinct A.y;`
+ `D = distinct A.y;` [[BR]]
- 
- `generate flatten(group), COUNT(D);`
+ `generate flatten(group), COUNT(D);` [[BR]]
+ `}`[[BR]]
- 
- `}`
  `grunt>` 
  
+ 

Reply via email to