Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change 
notification.

The "LoadStoreMigrationGuide" page has been changed by PradeepKamath.
http://wiki.apache.org/pig/LoadStoreMigrationGuide?action=diff&rev1=6&rev2=7

--------------------------------------------------

  
  = LoadFunc Migration =
  The methods in the old !LoadFunc have been split among a !LoadFunc abstract 
class and 3 new interfaces - !LoadMetadata (methods to deal with metadata), 
!LoadPushDown (methods to push operations from pig runtime into loader 
implementations) and !LoadCaster with methods to convert byte arrays to 
specific types. An example of how a simple !LoadFunc implementation based on 
old interface can be converted to the new interfaces will be shown below. The 
loader implementation in the example is a loader for text data with line 
delimiter as '\n' and '\t' as default field delimiter (which can be overridden 
by passing a different field delimiter in the constructor) - this is similar to 
current !PigStorage loader in Pig.
+ 
+ == Table mapping old API calls to new API calls ==
+ || '''Old Method in !LoadFunc''' || '''Equivalent New Method''' || '''New 
Class/Interface in which method is present''' || '''Explanation''' ||
+ || bindTo() || prepareToRead() || !LoadFunc || bindTo() was the old method 
which would provide an !InputStream among other things to the !LoadFunc. The 
!LoadFunc implementation would then read from the !InputStream in getNext(). In 
the new API, reading of the data is through the !InputFormat provided by the 
!LoadFunc. So the equivalent call is prepareToRead() wherein the !RecordReader 
associated with the !InputFormat provided by the !LoadFunc is passed to the 
!LoadFunc. The !RecordReader can then be used by the implementation in 
getNext() to return a tuple representing a record of data back to pig. ||
+ || getNext() || getNext() || !LoadFunc || The meaning of getNext() has not 
changed and is called by Pig runtime to get the next tuple in the data ||
+ || bytesToInteger(),...bytesToBag() ||  bytesToInteger(),...bytesToBag() || 
!LoadCaster || The meaning of these methods has not changed and is called by 
Pig runtime to cast a !DataByteArray fields to the right type when needed. In 
the new API, a !LoadFunc implementation should give a !LoadCaster object back 
to pig as the return value of getLoadCaster() method so that it can be used for 
casting. If a null is returned then casting from !DataByteArray to any other 
type (implicitly or explicitly) in the pig script will not be possible ||
+ || fieldsToRead() || pushProject() || !LoadPushDown || fieldsToRead() was 
used by old code to convey to the loader the exact fields required by the pig 
script -the same semantics are now achieved through pushProject() of the 
!LoadPushDown interface. !LoadPushDown is an optional interface for loaders to 
implement - if a loader does not implement it, this will indicate to the pig 
runtime that the loader is not capable of returning just the required fields 
and will return all fields in the data. If a loader implementation is able to 
efficiently return only required fields, it should implement !LoadPushDown to 
improve query performance||
+ || determineSchema() || getSchema() || !LoadMetadata || determineSchema() was 
used by old code to ask the loader to provide a schema for the data returned by 
it - the same semantics are now achieved through getSchema() of the 
!LoadMetadata interface. !LoadMetadata is an optional interface for loaders to 
implement - if a loader does not implement it || 
  
  == Old Implementation ==
  {{{
@@ -232, +240 @@

  
  }}}
  
- == Table mapping old API calls to new API calls ==
- || '''Old Method in !LoadFunc''' || '''Equivalent New Method''' || '''New 
Class/Interface in which method is present''' || '''Explanation''' ||
- || bindTo() || prepareToRead() || !LoadFunc || bindTo() was the old method 
which would provide an InputStream among other things to the !LoadFunc to allow 
it store the stream to use in getNext(). In the new API, reading of the data is 
through the !InputFormat provided by the !LoadFunc. So the equivalent call is 
prepareToRead() wherein the !RecordReader associated with the !InputFormat 
provided by the !LoadFunc is passed to the !LoadFunc. The !RecordReader can 
then be used by the implementation in getNext() to return a tuple representing 
a record of data back to pig. ||
- || getNext() || getNext() || !LoadFunc ||
- 

Reply via email to