Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change 
notification.

The following page has been changed by OlgaN:
http://wiki.apache.org/pig/PigUserCookbook

------------------------------------------------------------------------------
  || ORDER BY 2 keys || 767 || 472 || 1.6 x ||
  
  ''' Use Types '''
- 
- This feature is only available in the new code currently accessible from 
types feature (which landed on the main branch in early January: 
http://svn.apache.org/viewvc/hadoop/pig/trunk/).
  
  If types are not specified in the load statement, Pig assumes the type of 
=double= for numeric computations. A lot of the time, your data would be much 
smaller, maybe, integer or long. Specifying the real type will help with speed 
of arithmetic computation. It has an additional advantage of early error 
detection.
  
@@ -135, +133 @@

  
  '''Take Advantage of Join Optimization'''
  
- This feature is only available in the new code currently accessible from 
types branch: http://svn.apache.org/viewvc/hadoop/pig/branches/types/.
- 
  The optimization insures that the last table in the join is not brought into 
memory but stream through instead. The optimization reduces the amount of 
memory used which means you can avoid spilling the data and also should be able 
to scale your query to larger data volumes.
  
  To take advantage of this optimization, make sure that the table with the 
largest number of tuples per key is the last table in your query.
@@ -172, +168 @@

  dump C; 
  }}}
  
- In pig 1.x, DISTINCT is just GROUP BY/PROJECT under the hood. In pig 2.0 
(types branch) it is not, and it is much faster and more efficient (depending 
on your key cardinality, up to 20x faster in pig team's tests). Therefore, the 
use of DISTINCT is recommended over GROUP BY - GENERATE. 
+ In pig 1.x, DISTINCT is just GROUP BY/PROJECT under the hood. In pig 0.2.0 it 
is not, and it is much faster and more efficient (depending on your key 
cardinality, up to 20x faster in pig team's tests). Therefore, the use of 
DISTINCT is recommended over GROUP BY - GENERATE. 
  

Reply via email to