[Pig Wiki] Update of "LoadStoreMigrationGuide" by Prade epKamath

Apache Wiki Mon, 22 Mar 2010 15:11:37 -0700

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change 
notification.


The "LoadStoreMigrationGuide" page has been changed by PradeepKamath.
http://wiki.apache.org/pig/LoadStoreMigrationGuide?action=diff&rev1=37&rev2=38

--------------------------------------------------

  ||No equivalent method ||relativeToAbsolutePath() ||!LoadFunc ||Pig runtime 
will call this method to allow the Loader to convert a relative load location 
to an absolute location. The default implementation provided in !LoadFunc 
handles this for !FileSystem locations. If the load source is something else, 
loader implementation may choose to override this. ||
  ||determineSchema() ||getSchema() ||!LoadMetadata ||determineSchema() was 
used by old code to ask the loader to provide a schema for the data returned by 
it - the same semantics are now achieved through getSchema() of the 
!LoadMetadata interface. !LoadMetadata is an optional interface for loaders to 
implement - if a loader does not implement it, this will indicate to the pig 
runtime that the loader cannot return a schema for the data ||
  ||fieldsToRead() ||pushProjection() ||!LoadPushDown ||fieldsToRead() was used 
by old code to convey to the loader the exact fields required by the pig script 
-the same semantics are now achieved through pushProject() of the !LoadPushDown 
interface. !LoadPushDown is an optional interface for loaders to implement - if 
a loader does not implement it, this will indicate to the pig runtime that the 
loader is not capable of returning just the required fields and will return all 
fields in the data. If a loader implementation is able to efficiently return 
only required fields, it should implement !LoadPushDown to improve query 
performance ||
- ||No equivalent method ||getInputFormat() ||!LoadFunc ||This method will be 
called by Pig to get the !InputFormat used by the loader. The methods in the 
!InputFormat (and underlying !RecordReader) will be called by pig in the same 
manner (and in the same context) as by Hadoop in a map-reduce java program. 
'''If the !InputFormat is a hadoop packaged one, the implementation should use 
the new API based one under org.apache.hadoop.mapreduce. If it is a custom 
!InputFormat, it should be implemented using the new API in 
org.apache.hadoop.mapreduce'''.  If a custom loader using a text-based 
InputFormat or a file based InputFormat would like to read files in all 
subdirectories under a given input directory recursively, then it should use 
the PigFileInputFormat and PigTextInputFormat classes provided in 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer. This is to work 
around the current limitation in Hadoop's TextInputFormat and FileInputFormat 
which only read one level down from provided input directory. So for example if 
the input in the load statement is 'dir1' and there are subdirs 'dir2' and 
'dir2/dir3' underneath dir1, using Hadoop's TextInputFormat or FileInputFormat 
only files under 'dir1' can be read. Using PigFileInputFormat or 
PigTextInputFormat (or by extending them), files in all the directories can be 
read.||
+ ||No equivalent method ||getInputFormat() ||!LoadFunc ||This method will be 
called by Pig to get the !InputFormat used by the loader. The methods in the 
!InputFormat (and underlying !RecordReader) will be called by pig in the same 
manner (and in the same context) as by Hadoop in a map-reduce java program. 
'''If the !InputFormat is a hadoop packaged one, the implementation should use 
the new API based one under org.apache.hadoop.mapreduce. If it is a custom 
!InputFormat, it should be implemented using the new API in 
org.apache.hadoop.mapreduce'''.  If a custom loader using a text-based 
!InputFormat or a file based !InputFormat would like to read files in all 
subdirectories under a given input directory recursively, then it should use 
the !PigFileInputFormat and !PigTextInputFormat classes provided in 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer. This is to work 
around the current limitation in Hadoop's !TextInputFormat and !FileInputFormat 
which only read one level down from provided input directory. So for example if 
the input in the load statement is 'dir1' and there are subdirs 'dir2' and 
'dir2/dir3' underneath dir1, using Hadoop's !TextInputFormat or 
!FileInputFormat only files under 'dir1' can be read. Using !PigFileInputFormat 
or !PigTextInputFormat (or by extending them), files in all the directories can 
be read.||
  ||No equivalent method ||setLocation() ||!LoadFunc ||This method is called by 
Pig to communicate the load location to the loader. The loader should use this 
method to communicate the same information to the underlying !InputFormat. This 
method is called multiple times by pig - implementations should bear in mind 
that this method is called multiple times and should ensure there are no 
inconsistent side effects due to the multiple calls. ||
  ||bindTo() ||prepareToRead() ||!LoadFunc ||bindTo() was the old method which 
would provide an !InputStream among other things to the !LoadFunc. The 
!LoadFunc implementation would then read from the !InputStream in getNext(). In 
the new API, reading of the data is through the !InputFormat provided by the 
!LoadFunc. So the equivalent call is prepareToRead() wherein the !RecordReader 
associated with the !InputFormat provided by the !LoadFunc is passed to the 
!LoadFunc. The !RecordReader can then be used by the implementation in 
getNext() to return a tuple representing a record of data back to pig. ||
  ||getNext() ||getNext() ||!LoadFunc ||The meaning of getNext() has not 
changed and is called by Pig runtime to get the next tuple in the data - in the 
new API, this is the method wherein the implementation will use the the 
underlying !RecordReader and construct a tuple ||

[Pig Wiki] Update of "LoadStoreMigrationGuide" by Prade epKamath

Reply via email to