Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change 
notification.

The "LoadStoreRedesignProposal" page has been changed by PradeepKamath.
http://wiki.apache.org/pig/LoadStoreRedesignProposal?action=diff&rev1=16&rev2=17

--------------------------------------------------

   1. How will we worked with compressed files?  !FileInputFormat already works 
with bzip and gzip compressed files, producing reasonable splits.  !PigStorage 
will be reworked to depend on !FileInputFormat (or a descendant thereof, see 
next item) and should therefore be able to use this functionality. Currently 
Pig supports gz/bzip for arbitrary loadfunc/storefunc combinations. With this 
proposal, gz/bzip format will only be supported for load/store using PigStorage.
  
  
- === Implementation details and status ===
+ == Implementation details and status ==
  
- ==== Current status ====
+ === Current status ===
  A branch -'load-store-redesign' 
(http://svn.apache.org/repos/asf/hadoop/pig/branches/load-store-redesign) has 
been created to undertake work on this proposal. As of today (Nov 2. 2009) this 
branch has simple load-store working for PigStorage and BinStorage. Joins on 
multiple inputs and multi store queries with multi query optimization also 
work. Some of the recent changes in the proposal above (the changes noted under 
Nov 2. 2009 in the Changes below) have not been incorporated. A list (may not 
be comprehensive) of remaining tasks is listed in a subsection below.
  
- ==== Notes on implementation details ====
+ === Notes on implementation details ===
+ This section is to document changes made at a high level to give an overall 
connected picture which code comments may not provide. 
  
+ ==== Changes to work with Hadoop !InputFormat model ====
+ 
+ ==== Changes to work with Hadoop !OutputFormat model ====
+ 
- ==== Remaining Tasks ====
+ === Remaining Tasks ===
-  * BinStorage needs to implement LoadMetadata's getSchema() to replace 
current determineSchema()
+  * !BinStorage needs to implement !LoadMetadata's getSchema() to replace 
current determineSchema()
   * piggybank loaders/storers need to be ported
-  * fix lineage code to use LoadCaster instead of LoadFunc
+  * fix lineage code to use !LoadCaster instead of !LoadFunc
   * local mode needs to be ported
-  * PigDump needs to be ported
+  * !PigDump needs to be ported
-  * poload needs to be ported
+  * !POLoad needs to be ported
   * Need to handle passing loadfunc specific info between different instances 
of loadfunc (Different instances in front end and 
  between front end and back end - we need what is required in PIG-602) 
(setPartitionFilter() and pushOperators()for example needs 
  this - these methods are called in the front end but the information passed 
is needed in the backend)
-  * For ResourceSchema to be effectively used for communicating schema, we 
must fix the two level access issues with 
+  * For !ResourceSchema to be effectively used for communicating schema, we 
must fix the two level access issues with 
  schema of bags in current schema before we make these changes, otherwise that 
same contagion will afflict us here. 
   * Input/Output handler code in streaming needs to be ported 
   * split by file will have to removed from language
   * fix code with FIXME in comment relating to load-store redesign
-  * Decide on what we should do with ReversibleLoadFunc and multiquery 
optimization
+  * Decide on what we should do with !ReversibleLoadFunc and multiquery 
optimization
  
  
  

Reply via email to