Dear Wiki user, You have subscribed to a wiki page or wiki category on "Pig Wiki" for change notification.
The "LoadStoreMigrationGuide" page has been changed by PradeepKamath. http://wiki.apache.org/pig/LoadStoreMigrationGuide?action=diff&rev1=37&rev2=38 -------------------------------------------------- ||No equivalent method ||relativeToAbsolutePath() ||!LoadFunc ||Pig runtime will call this method to allow the Loader to convert a relative load location to an absolute location. The default implementation provided in !LoadFunc handles this for !FileSystem locations. If the load source is something else, loader implementation may choose to override this. || ||determineSchema() ||getSchema() ||!LoadMetadata ||determineSchema() was used by old code to ask the loader to provide a schema for the data returned by it - the same semantics are now achieved through getSchema() of the !LoadMetadata interface. !LoadMetadata is an optional interface for loaders to implement - if a loader does not implement it, this will indicate to the pig runtime that the loader cannot return a schema for the data || ||fieldsToRead() ||pushProjection() ||!LoadPushDown ||fieldsToRead() was used by old code to convey to the loader the exact fields required by the pig script -the same semantics are now achieved through pushProject() of the !LoadPushDown interface. !LoadPushDown is an optional interface for loaders to implement - if a loader does not implement it, this will indicate to the pig runtime that the loader is not capable of returning just the required fields and will return all fields in the data. If a loader implementation is able to efficiently return only required fields, it should implement !LoadPushDown to improve query performance || - ||No equivalent method ||getInputFormat() ||!LoadFunc ||This method will be called by Pig to get the !InputFormat used by the loader. The methods in the !InputFormat (and underlying !RecordReader) will be called by pig in the same manner (and in the same context) as by Hadoop in a map-reduce java program. '''If the !InputFormat is a hadoop packaged one, the implementation should use the new API based one under org.apache.hadoop.mapreduce. If it is a custom !InputFormat, it should be implemented using the new API in org.apache.hadoop.mapreduce'''. If a custom loader using a text-based InputFormat or a file based InputFormat would like to read files in all subdirectories under a given input directory recursively, then it should use the PigFileInputFormat and PigTextInputFormat classes provided in org.apache.pig.backend.hadoop.executionengine.mapReduceLayer. This is to work around the current limitation in Hadoop's TextInputFormat and FileInputFormat which only read one level down from provided input directory. So for example if the input in the load statement is 'dir1' and there are subdirs 'dir2' and 'dir2/dir3' underneath dir1, using Hadoop's TextInputFormat or FileInputFormat only files under 'dir1' can be read. Using PigFileInputFormat or PigTextInputFormat (or by extending them), files in all the directories can be read.|| + ||No equivalent method ||getInputFormat() ||!LoadFunc ||This method will be called by Pig to get the !InputFormat used by the loader. The methods in the !InputFormat (and underlying !RecordReader) will be called by pig in the same manner (and in the same context) as by Hadoop in a map-reduce java program. '''If the !InputFormat is a hadoop packaged one, the implementation should use the new API based one under org.apache.hadoop.mapreduce. If it is a custom !InputFormat, it should be implemented using the new API in org.apache.hadoop.mapreduce'''. If a custom loader using a text-based !InputFormat or a file based !InputFormat would like to read files in all subdirectories under a given input directory recursively, then it should use the !PigFileInputFormat and !PigTextInputFormat classes provided in org.apache.pig.backend.hadoop.executionengine.mapReduceLayer. This is to work around the current limitation in Hadoop's !TextInputFormat and !FileInputFormat which only read one level down from provided input directory. So for example if the input in the load statement is 'dir1' and there are subdirs 'dir2' and 'dir2/dir3' underneath dir1, using Hadoop's !TextInputFormat or !FileInputFormat only files under 'dir1' can be read. Using !PigFileInputFormat or !PigTextInputFormat (or by extending them), files in all the directories can be read.|| ||No equivalent method ||setLocation() ||!LoadFunc ||This method is called by Pig to communicate the load location to the loader. The loader should use this method to communicate the same information to the underlying !InputFormat. This method is called multiple times by pig - implementations should bear in mind that this method is called multiple times and should ensure there are no inconsistent side effects due to the multiple calls. || ||bindTo() ||prepareToRead() ||!LoadFunc ||bindTo() was the old method which would provide an !InputStream among other things to the !LoadFunc. The !LoadFunc implementation would then read from the !InputStream in getNext(). In the new API, reading of the data is through the !InputFormat provided by the !LoadFunc. So the equivalent call is prepareToRead() wherein the !RecordReader associated with the !InputFormat provided by the !LoadFunc is passed to the !LoadFunc. The !RecordReader can then be used by the implementation in getNext() to return a tuple representing a record of data back to pig. || ||getNext() ||getNext() ||!LoadFunc ||The meaning of getNext() has not changed and is called by Pig runtime to get the next tuple in the data - in the new API, this is the method wherein the implementation will use the the underlying !RecordReader and construct a tuple ||
