Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change 
notification.

The "Pig070LoadStoreHowTo" page has been changed by PradeepKamath.
http://wiki.apache.org/pig/Pig070LoadStoreHowTo?action=diff&rev1=3&rev2=4

--------------------------------------------------

  = Overview =
- This page describes how to go about writing Load functions and Store 
functions using the API available in Pig 0.7.0. 
+ This page describes how to go about writing Load functions and Store 
functions using the API available in Pig 0.7.0.
  
  == How to implement a Loader ==
  [[LoadFunc || 
http://svn.apache.org/viewvc/hadoop/pig/trunk/src/org/apache/pig/LoadFunc.java?view=markup]]
  abstract class which has the main methods for loading data and for most use 
case it might suffice to extend it. There are 3 other optional interfaces which 
can be implemented to achieve extended functionality:
-  * !LoadMetadata has methods to deal with metadata - most implementation of 
loaders don't need to implement this unless they interact with some metadata 
system. The getSchema() method in this interface provides a way for 
+  * !LoadMetadata has methods to deal with metadata - most implementation of 
loaders don't need to implement this unless they interact with some metadata 
system. The getSchema() method in this interface provides a way for
  loader implementations to communicate the schema of the data back to pig. If 
a loader implementation returns data comprised of fields of real types (rather 
than !DataByteArray fields), it should provide the schema describing
  the data returned through the getSchema() method. The other methods are 
concerned with other types of metadata like partition keys and statistics. 
Implementations can return null return values for these methods if they are
  not applicable for that implementation.
-  * !LoadPushDown has methods to push operations from pig runtime into loader 
implementations - currently only projections .i.e the pushProjection() method 
is called by Pig to communicate to the loader what exact fields 
+  * !LoadPushDown has methods to push operations from pig runtime into loader 
implementations - currently only projections .i.e the pushProjection() method 
is called by Pig to communicate to the loader what exact fields
  are required in the pig script. The loader implementation can choose to honor 
the request or respond that it will not honor the request and return all fields 
in the data.If a loader implementation is able to efficiently
- return only required fields, it should implement LoadPushDown to improve 
query performance.
+ return only required fields, it should implement !LoadPushDown to improve 
query performance.
-  * !LoadCaster has methods to convert byte arrays to specific types. A loader 
implementation should implement this if casts (implicit or explicit) from 
!DataByteArray fields to other types need to be supported. 
+  * !LoadCaster has methods to convert byte arrays to specific types. A loader 
implementation should implement this if casts (implicit or explicit) from 
!DataByteArray fields to other types need to be supported.
  
  The !LoadFunc abstract class is the main class to extend to implement a 
loader. The methods which need to be overriden are explained below:
-  * getInputFormat() :This method will be called by Pig to get the 
!InputFormat used by the loader. The methods in the !InputFormat (and 
underlying !RecordReader) will be called by pig in the same manner (and in the 
same context) 
+  * getInputFormat() :This method will be called by Pig to get the 
!InputFormat used by the loader. The methods in the !InputFormat (and 
underlying !RecordReader) will be called by pig in the same manner (and in the 
same context)
- as by Hadoop in a map-reduce java program. If the !InputFormat is a hadoop 
packaged one, the implementation should use the new API based one under 
org.apache.hadoop.mapreduce. If it is a custom !InputFormat, it should be 
+ as by Hadoop in a map-reduce java program. If the !InputFormat is a hadoop 
packaged one, the implementation should use the new API based one under 
org.apache.hadoop.mapreduce. If it is a custom !InputFormat, it should be
  implemented using the new API in org.apache.hadoop.mapreduce.
-  * setLocation() :This method is called by Pig to communicate the load 
location to the loader. The loader should use this method to communicate the 
same information to the underlying !InputFormat. This method is called multiple 
+  * setLocation() :This method is called by Pig to communicate the load 
location to the loader. The loader should use this method to communicate the 
same information to the underlying !InputFormat. This method is called multiple
  times by pig - implementations should bear in mind that this method is called 
multiple times and should ensure there are no inconsistent side effects due to 
the multiple calls.
-  * prepareToRead() : Through this method the !RecordReader associated with 
the !InputFormat provided by the !LoadFunc is passed to the !LoadFunc. The 
!RecordReader can then be used by the implementation in getNext() to return a 
+  * prepareToRead() : Through this method the !RecordReader associated with 
the !InputFormat provided by the !LoadFunc is passed to the !LoadFunc. The 
!RecordReader can then be used by the implementation in getNext() to return a
- tuple representing a record of data back to pig. 
+ tuple representing a record of data back to pig.
-  * getNext() :The meaning of getNext() has not changed and is called by Pig 
runtime to get the next tuple in the data - in the new API, this is the method 
wherein the implementation will use the the underlying !RecordReader 
+  * getNext() :The meaning of getNext() has not changed and is called by Pig 
runtime to get the next tuple in the data - in the new API, this is the method 
wherein the implementation will use the the underlying !RecordReader
- and construct a tuple 
+ and construct a tuple
  
  The following methods have default implementations in !LoadFunc and should be 
overridden only if needed:
-  * setUdfContextSignature():This method will be called by Pig both in the 
front end and back end to pass a unique signature to the Loader. The signature 
can be used to store into the !UDFContext any information which the 
+  * setUdfContextSignature():This method will be called by Pig both in the 
front end and back end to pass a unique signature to the Loader. The signature 
can be used to store into the !UDFContext any information which the
- Loader needs to store between various method invocations in the front end and 
back end. A use case is to store !RequiredFieldList passed to it in 
!LoadPushDown.pushProjection(RequiredFieldList) for use in the back end before 
+ Loader needs to store between various method invocations in the front end and 
back end. A use case is to store !RequiredFieldList passed to it in 
!LoadPushDown.pushProjection(RequiredFieldList) for use in the back end before
  returning tuples in getNext(). The default implementation in !LoadFunc has an 
empty body. This method will be called before other methods.
-  * relativeToAbsolutePath():Pig runtime will call this method to allow the 
Loader to convert a relative load location to an absolute location. The default 
implementation provided in !LoadFunc handles this for !FileSystem 
+  * relativeToAbsolutePath():Pig runtime will call this method to allow the 
Loader to convert a relative load location to an absolute location. The default 
implementation provided in !LoadFunc handles this for !FileSystem
  locations. If the load source is something else, loader implementation may 
choose to override this.
  
  === Example Implementation ===
- The loader implementation in the example is a loader for text data with line 
delimiter as '\n' and '\t' as default field delimiter (which can be overridden 
by passing a different field delimiter in the constructor) - 
+ The loader implementation in the example is a loader for text data with line 
delimiter as '\n' and '\t' as default field delimiter (which can be overridden 
by passing a different field delimiter in the constructor) -
  this is similar to current !PigStorage loader in Pig. The new implementation 
uses an existing Hadoop supported !Inputformat - !TextInputFormat as the 
underlying !InputFormat.
  
  {{{

Reply via email to