Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change 
notification.

The following page has been changed by OlgaN:
http://wiki.apache.org/pig/StorageFunction

New page:
[[Anchor(Load/Store_Functions)]]
== Load/Store Functions ==
Load/Store Functions are written by implementing one or both of the interfaces 
given below. 

If the !LoadFunc interface is implemented, the function can be used to load 
tuples. If the !StoreFunc interface is implemented, the function can be used to 
store tuples. Since loading and storing are usually tied to each other, most 
functions will implement both interfaces, e.g., !PigStorage and !BinStorage do. 
However, occassionally, we may write a function only for loading.

{{{
public interface LoadFunc {
        /**
         * This interface is used to implement functions to parse records
         * from a dataset.
         *
         */
        /**
         * Specifies a portion of an InputStream to read tuples. Because the
         * starting and ending offsets may not be on record boundaries it is up 
to
         * the implementor to deal with figuring out the actual starting and 
ending
         * offsets in such a way that an arbitrarily sliced up file will be 
processed
         * in its entirety.
         * <p>
         * A common way of handling slices in the middle of records is to start 
at
         * the given offset and, if the offset is not zero, skip to the end of 
the
         * first record (which may be a partial record) before reading tuples.
         * Reading continues until a tuple has been read that ends at an offset 
past
         * the ending offset.
         *  
         * @param fileName the name of the file to be read
         * @param is the stream representing the file to be processed.
         * @param offset the offset to start reading tuples.
         * @param end the ending offset for reading.
         * @throws IOException
         */
        public abstract void bindTo(String fileName, InputStream is, long 
offset, long end) throws IOException;

        /**
         * Retrieves the next tuple to be processed.
         * @return the next tuple to be processed or null if there are no more 
tuples
         * to be processed.
         * @throws IOException
         */
        public abstract Tuple getNext() throws IOException;     
}
}}}

and

{{{

public interface StoreFunc {
        /**
        * This interface is used to implement functions to write records
        * from a dataset.
        *
        */
        
        /**
         * Specifies the OutputStream to write to. This will be called before
         * store(Tuple) is invoked.
         * 
         * @param os The stream to write tuples to.
         * @throws IOException
         */
    public abstract void bindTo(OutputStream os) throws IOException;

    /**
     * Write a tuple the output stream to which this instance was
     * previously bound.
     * 
     * @param f the tuple to store.
     * @throws IOException
     */
    public abstract void putNext(Tuple f) throws IOException;

        /**
     * Do any kind of post processing because the last tuple has been
     * stored. DO NOT CLOSE THE STREAM in this method. The stream will be
     * closed later outside of this function.
     * 
     * @throws IOException
     */
    public abstract void finish() throws IOException;  
}
}}}

Reply via email to