Dear Wiki user, You have subscribed to a wiki page or wiki category on "Pig Wiki" for change notification.
The following page has been changed by OlgaN: http://wiki.apache.org/pig/PigLatin ------------------------------------------------------------------------------ = Introduction to Pig Latin = + + [[TableOfContents]] So you want to learn Pig Latin. Welcome! Lets begin with the data types. @@ -13, +15 @@ * A '''Data Bag''' is a set of tuples (duplicate tuples are allowed). You may think of it as a "table", except that Pig does not require that the tuple field types match, or even that the tuples have the same number of fields! (It is up to you whether you want these properties.) We denote bags by { } bracketing. Thus, a data bag could be {<apache.org,1.0>, <flickr.com,0.8>} * A '''Data Map''' is a map from keys that are string literals to values that can be any data type. Think of it as a !HashMap<String,X> where X can be any of the 4 pig data types. A Data Map supports the expected get and put interface. We denote maps by [ ] bracketing, with ":" separating the key and the value, and ";" separating successive key value pairs. Thus. a data map could be [ 'apache' : <'pig', 'hadoop'> ; 'cnn' : 'news' ]. Here, the key 'apache' is mapped to the tuple with 2 atomic fields 'pig' and 'hadoop', while the key 'cnn' is mapped to the data atom 'news'. - #DataItems == Data Items == Data can be referred to in various powerful and convenient ways in Pig. Any data referred to is called a Data Item. We will illustrate all these ways by using the following example tuple. @@ -28, +29 @@ || Field referred to by position || $0 || Data Atom '1' || In Pig, positions start at 0 and not 1 || || Field referred to by name || f2 || Bag {<2,3>,<4,6>,<5,7>} || || || Projection of another data item || f2.$0 || Bag {<2>,<4>,<5>} - the bag f2 projected to the first field || || - || Map Lookup against another data item || f3#'apache' || Data Atom 'pig' || User's responsibility to ensure that a lookup is written only against a data map, otherwise a runtime error is thrown. If the key being looked up does not exist, a Data Atom with an empty string is returned || + || Map Lookup against another data item || f3#'apache' || Data Atom 'pig' || * User's responsibility to ensure that a lookup is written only against a data map, otherwise a runtime error is thrown. [[BR]] * If the key being looked up does not exist, a Data Atom with an empty string is returned || || Function applied to another data item || SUM(f2.$0) || 2+4+5 = 11 || SUM is a builtin Pig function. See PigFunctions for how to write your own functions || || Infix Expression of other data items || COUNT(f2) + f1 / '2.0' || 3 + 1 / 2.0 = 3.5 || || || Bincond, i.e., the value of the data item is chosen according to some condition ||(f1 = = '1' ? '2' : COUNT(f2))|| '2' since f1=='1' is true. If f1 were != '1', then the value of this data item for t would be COUNT(f2)=3 || See [#CondS Conditions] for what the format of the condition in the bincond can be || @@ -43, +44 @@ `grunt> A = load 'data' using PigStorage() as (x, y, z);` `grunt>B = group A by x;` - `grunt> C = foreach B {` + `grunt> C = foreach B {`[[BR]] - - `D = distinct A.y;` + `D = distinct A.y;` [[BR]] - - `generate flatten(group), COUNT(D);` + `generate flatten(group), COUNT(D);` [[BR]] + `}`[[BR]] - - `}` `grunt>` +