Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change 
notification.

The "Pig070LoadStoreHowTo" page has been changed by PradeepKamath.
http://wiki.apache.org/pig/Pig070LoadStoreHowTo?action=diff&rev1=9&rev2=10

--------------------------------------------------

  
  The main motivation for the changes in Pig 0.7.0 load/store api is to move 
closer to using Hadoop's !InputFormat and !OutputFormat classes. This way pig 
users/developers can create new !LoadFunc and !StoreFunc implementation based 
on existing Hadoop !InputFormat and !OutputFormat classes with minimal code. 
The complexity of reading the data and creating a record will now lie in the 
!InputFormat and likewise on the writing end, the complexity of writing will 
lie in the !OutputFormat. This enables !Pig to easily read/write data in new 
storage formats as and when an Hadoop !InputFormat and !OutputFormat is 
available for them.  
  
- '''A general note applicable to both LoadFunc and StoreFunc implementations 
is that the implementation should use the new Hadoop 20 API based classes 
(InputFormat/OutputFormat and related classes) under the 
org.apache.hadoop.mapreduce package instead of the old org.apache.hadoop.mapred 
package.'''
+ '''A general note applicable to both !LoadFunc and !StoreFunc implementations 
is that the implementation should use the new Hadoop 20 API based classes 
(!InputFormat/OutputFormat and related classes) under the 
org.apache.hadoop.mapreduce package instead of the old org.apache.hadoop.mapred 
package.'''
  
  = How to implement a Loader =
  
[[http://svn.apache.org/viewvc/hadoop/pig/trunk/src/org/apache/pig/LoadFunc.java?view=markup
 | LoadFunc]]  abstract class which has the main methods for loading data and 
for most use case it might suffice to extend it. There are 3 other optional 
interfaces which can be implemented to achieve extended functionality:
@@ -26, +26 @@

   * getNext() :The meaning of getNext() has not changed and is called by Pig 
runtime to get the next tuple in the data - in the new API, this is the method 
wherein the implementation will use the the underlying !RecordReader and 
construct a tuple
  
  The following methods have default implementations in !LoadFunc and should be 
overridden only if needed:
-  * setUdfContextSignature():This method will be called by Pig both in the 
front end and back end to pass a unique signature to the Loader. The signature 
can be used to store into the !UDFContext any information which the Loader 
needs to store between various method invocations in the front end and back 
end. A use case is to store !RequiredFieldList passed to it in 
!LoadPushDown.pushProjection(!RequiredFieldList) for use in the back end before 
returning tuples in getNext(). The default implementation in !LoadFunc has an 
empty body. This method will be called before other methods.
+  * setUdfContextSignature():This method will be called by Pig both in the 
front end and back end to pass a unique signature to the Loader. The signature 
can be used to store into the UDFContext any information which the Loader needs 
to store between various method invocations in the front end and back end. A 
use case is to store !RequiredFieldList passed to it in 
!LoadPushDown.pushProjection(!RequiredFieldList) for use in the back end before 
returning tuples in getNext(). The default implementation in !LoadFunc has an 
empty body. This method will be called before other methods.
   * relativeToAbsolutePath():Pig runtime will call this method to allow the 
Loader to convert a relative load location to an absolute location. The default 
implementation provided in !LoadFunc handles this for !FileSystem locations. If 
the load source is something else, loader implementation may choose to override 
this.  
  
  == Example Implementation ==
@@ -157, +157 @@

   * relToAbsPathForStoreLocation(): Pig runtime will call this method to allow 
the Storer to convert a relative store location to an absolute location. An 
implementation is provided in !StoreFunc which handles this for FileSystem 
based locations.  
   * checkSchema(): A Store function should implement this function to check 
that a given schema describing the data to be written is acceptable to it. The 
default implementation in !StoreFunc has an empty body. This method will be 
called before any calls to setStoreLocation(). 
  
-  == Example Implementation ==
+ == Example Implementation ==
  The storer implementation in the example is a storer for text data with line 
delimiter as '\n' and '\t' as default field delimiter (which can be overridden 
by passing a different field delimiter in the constructor) - this is similar to 
current !PigStorage storer in Pig. The new implementation uses an existing 
Hadoop supported !OutputFormat - TextOutputFormat as the underlying 
!OutputFormat.
  
  {{{

Reply via email to