Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change 
notification.

The following page has been changed by CorinneC:
http://wiki.apache.org/pig/PigTutorial

------------------------------------------------------------------------------
  || '''UDF''' || '''Description'''||
  || !ExtractHour || Extracts the hour from the record.||
  || N!GramGenerator || Extracts n-grams from the set of words. ||
- || !NonPornDetector|| Removes porn terms from the query field. ||
+ || !NonPornDetector|| Removes the record if the query field includes porn 
terms. ||
  || NonURLDetector || Removes the record if the query field is empty or a URL. 
||
  || !ScoreGenerator || Calculates a "popularity" score for the n-gram.||
- || !ToLower || Switches the query field to lowercase. ||
+ || !ToLower || Changes the query field to lowercase. ||
  || !TutorialUtil || Divides the query string into a set of words.||
  
  
@@ -98, +98 @@

   * Calls the N!GramGenerator UDF to compose the n-grams of the query.
   * Calls the DISTINCT operator to get the unique n-grams for all records.
   * Gets the count (occurrences) of each n-gram.
-  * Calls the !ScoreGenerator UDF to calculate a popularity score for the 
n-gram.
+  * Calls the !ScoreGenerator UDF to calculate a "popularity" score for the 
n-gram.
   * Removes all records with a score less than or equal to 2.0.
   * Sorts the remaining records by hour and score.
   * Saves the results. The output file contains a list of n-grams with the 
following fields: '''hour''', '''ngram''', '''score''', '''count''', '''mean'''

Reply via email to