Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change 
notification.

The "LoadStoreMigrationGuide" page has been changed by PradeepKamath.
http://wiki.apache.org/pig/LoadStoreMigrationGuide?action=diff&rev1=10&rev2=11

--------------------------------------------------

  
  The main change is that the new !LoadFunc API is based on a !InputFormat to 
read the data. Implementations can choose to use existing !InputFormats like 
!TextInputFormat or implement a new one.
   
- == Table mapping old API calls to new API calls in rough order of call 
sequence==
+ == Table mapping old API calls to new API calls in rough order of call 
sequence ==
  || '''Old Method in !LoadFunc''' || '''Equivalent New Method''' || '''New 
Class/Interface in which method is present''' || '''Explanation''' ||
  || No equivalent method || setUDFContextSignature() || !LoadFunc || This 
method will be called by Pig both in the front end and back end to pass a 
unique signature to the Loader. The signature can be used to store into the 
UDFContext} any information which the Loader needs to store between various 
method invocations in the front end and back end. A use case is to store 
!RequiredFieldList passed to it in 
!LoadPushDown.pushProjection(!RequiredFieldList) for use in the back end before 
returning tuples in getNext()||
  || No equivalent method || relativeToAbsolutePath() || !LoadFunc || Pig 
runtime will call this method to allow the Loader to convert a relative load 
location to an absolute location. The default implementation provided in 
!LoadFunc handles this for hdfs files and directories. If the load source is 
something else, loader implementation may choose to override this.||
@@ -22, +22 @@

  || getNext() || getNext() || !LoadFunc || The meaning of getNext() has not 
changed and is called by Pig runtime to get the next tuple in the data ||
  || bytesToInteger(),...bytesToBag() ||  bytesToInteger(),...bytesToBag() || 
!LoadCaster || The meaning of these methods has not changed and is called by 
Pig runtime to cast a !DataByteArray fields to the right type when needed. In 
the new API, a !LoadFunc implementation should give a !LoadCaster object back 
to pig as the return value of getLoadCaster() method so that it can be used for 
casting. If a null is returned then casting from !DataByteArray to any other 
type (implicitly or explicitly) in the pig script will not be possible ||
  
+ An example of how a simple !LoadFunc implementation based on old interface 
can be converted to the new interfaces is shown in the Examples section below. 
+ 
+ 
+ 
+ == Examples ==
+ 
+ === Loader ===
+ 
-  An example of how a simple !LoadFunc implementation based on old interface 
can be converted to the new interfaces will be shown below. The loader 
implementation in the example is a loader for text data with line delimiter as 
'\n' and '\t' as default field delimiter (which can be overridden by passing a 
different field delimiter in the constructor) - this is similar to current 
!PigStorage loader in Pig. The new implementation uses an existing Hadoop 
supported !Inputformat - !TextInputFormat as the underlying !InputFormat.
+ The loader implementation in the example is a loader for text data with line 
delimiter as '\n' and '\t' as default field delimiter (which can be overridden 
by passing a different field delimiter in the constructor) - this is similar to 
current !PigStorage loader in Pig. The new implementation uses an existing 
Hadoop supported !Inputformat - !TextInputFormat as the underlying !InputFormat.
  
  == Old Implementation ==
  {{{

Reply via email to