Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change 
notification.

The "LoadStoreRedesignProposal" page has been changed by PradeepKamath.
http://wiki.apache.org/pig/LoadStoreRedesignProposal?action=diff&rev1=12&rev2=13

--------------------------------------------------

  '''!StoreFunc'''
  
  {{{
+ 
+ /**
+ * This interface is used to implement functions to write records
+ * from a dataset.
+ */
+ 
  public interface StoreFunc {
+ 
+     /**
+      * This method is called by the Pig runtime in the front end to convert 
the
+      * output location to an absolute path if the location is relative. The
+      * StoreFunc implementation is free to choose how it converts a relative 
+      * location to an absolute location since this may depend on what the 
location
+      * string represent (hdfs path or some other data source)
+      * 
+      * @param location location as provided in the "store" statement of the 
script
+      * @param curDir the current working direction based on any "cd" 
statements
+      * in the script before the "store" statement. If there are no "cd" 
statements
+      * in the script, this would be the home directory - 
+      * <pre>/user/<username> </pre>
+      * @return the absolute location based on the arguments passed
+      * @throws IOException if the conversion is not possible
+      */
+     String relToAbsPathForStoreLocation(String location, Path curDir) throws 
IOException;
  
      /**
       * Return the OutputFormat associated with StoreFunc.  This will be called
       * on the front end during planning and not on the backend during
-      * execution.  OutputFormat information need not be carried to the back 
end
-      * as the appropriate RecordWriter will be provided to the StoreFunc.
+      * execution. 
+      * @return the {...@link OutputFormat} associated with StoreFunc
+      * @throws IOException if an exception occurs while constructing the 
+      * OutputFormat
-      */
+      *
+      */
-     OutputFormat getOutputFormat();
+     OutputFormat getOutputFormat() throws IOException;
  
      /**
       * Communicate to the store function the location used in Pig Latin to 
refer 
@@ -327, +353 @@

       * called during planning on the front end, not during execution on
       * the backend.
       * @param location Location indicated in store statement.
+      * @param job The {...@link Job} object
       * @throws IOException if the location is not valid.
       */
-     void setLocation(String location) throws IOException;
+     void setStoreLocation(String location, Job job) throws IOException;
   
      /**
       * Set the schema for data to be stored.  This will be called on the
+      * front end during planning. A Store function should implement this 
function to
-      * front end during planning.  If the store function wishes to record
-      * the schema it will need to carry it to the backend.
-      * Even if a store function cannot
-      * record the schema, it may need to implement this function to
       * check that a given schema is acceptable to it.  For example, it
       * can check that the correct partition keys are included;
       * a storage function to be written directly to an OutputFormat can
       * make sure the schema will translate in a well defined way.  
-      * @param schema to be checked/set
+      * @param s to be checked
-      * @throw IOException if this schema is not acceptable.  It should include
+      * @throws IOException if this schema is not acceptable.  It should 
include
       * a detailed error message indicating what is wrong with the schema.
       */
-     void setSchema(ResourceSchema s) throws IOException;
+     void checkSchema(ResourceSchema s) throws IOException;
  
      /**
       * Initialize StoreFunc to write data.  This will be called during
       * execution before the call to putNext.
       * @param writer RecordWriter to use.
+      * @throws IOException if an exception occurs during initialization
       */
-     void prepareToWrite(RecordWriter writer);
+     void prepareToWrite(RecordWriter writer) throws IOException;
- 
-     /**
-      * Called when all writing is finished.  This will be called on the 
backend,
-      * once for each writing task.
-      */
-     void doneWriting();
  
      /**
       * Write a tuple the output stream to which this instance was
       * previously bound.
       * 
-      * @param f the tuple to store.
+      * @param t the tuple to store.
-      * @throws IOException
+      * @throws IOException if an exception occurs during the write
       */
      void putNext(Tuple t) throws IOException;
- 
-     /**
-      * Called when writing all of the data is finished.  This can be used
-      * to commit information to a metadata system, clean up tmp files, 
-      * close connections, etc.  This call will be made on the front end
-      * after all back end processing is finished.
-      * @param conf The job configuration 
-      */
-     void allFinished(Configuration conf);
- 
- 
  
  }
  
@@ -529, +537 @@

   * setLocation() now also takes a Job argument since the main purpose of this 
call is to an opportunity to the LoadFunc implementation to communicate the 
input location to underlying InputFormat. InputFormat implementations inturn 
seem to be storing this information inthe Job. For example, FileInputFormat has 
the following static method to set the input location: setInputPaths(JobConf 
conf, String commaSeparatedPaths) ;
   * Removed doneReading() method since there is already a RecordReader.close() 
method which will be called by Hadoop wherein all the functionality that needs 
to be done on completion of reading can be done.
   * All methods now can throw IOException - this keeps the interface more 
flexible for exception cases
- 
  In LoadMetadata:
   * getSchema(), getStatistics() and getPartitionKeys() methods now take a 
location and Configuration argument so that the implementation can use that 
information in returning the information requested.
+ In StoreFunc:
+  * Added relativeToAbsolutePath() method per 
http://issues.apache.org/jira/browse/PIG-879?focusedCommentId=12768818&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12768818
+  * Methods which did not throw IOException now do so to enable exceptions in 
implementations
+  * Removed doneWriting() - same functionality already present in 
RecordWriter.close() and OutputCommitter.commitTask()
+  * Changed setSchema() to checkSchema since this method is called only to 
allow StoreFunc to check
+  * Removed allFinished() - same functionality already present in 
OutputCommitter.cleanupJob()
  

Reply via email to