Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change 
notification.

The "LoadStoreRedesignProposal" page has been changed by AlanGates:
http://wiki.apache.org/pig/LoadStoreRedesignProposal?action=diff&rev1=6&rev2=7

  type conversion on this data will be done in the same way as noted above for 
!InputFormatLoader.
  
  Open Questions:
-  1. Does all this force us to switch to Hadoop for local mode as well?  We 
aren't opposed to using Hadoop for local mode it just needs to get reasonable 
fast.  Can we use !InputFormat ''et. al.'' on local files without using the 
whole HDFS structure?
+  1. Does all this force us to switch to Hadoop for local mode as well?  We 
aren't opposed to using Hadoop for local mode it just needs to get reasonable 
fast.  Can we use !InputFormat ''et. al.'' on local files without using the 
whole HDFS structure?  '''Answer''' According to Hadoop documentation 
!TextInputFormat works on local files as well as hdfs files.  We may need to 
catch that we are in local mode and change the filename to `file://`
+  1. How will we worked with compressed files?  !FileInputFormat already works 
with bzip and gzip compressed files, producing reasonable splits.  !PigStorage 
will be reworked to depend on !FileInputFormat (or a descendant thereof, see 
next item) and should therefore be able to use this functionality.
+  1. How will the need for mark and seek in index construction for merge join 
be handled?  In the long term we'd like Hadoop to handle this for us by 
creating a !SeekableInputFormat that would add this functionality.  In the 
meantime we can extend !FileInputFormat to !PigFileInputFormat.  We can add 
getPos() call to this class that will provide a position to start reading at to 
find the tuple being indexed.  Note that this position will not necessarily be 
the exact position of the tuple, but a position from which the tuple can be 
found.  We can also change the getSplits call on this method to return a split 
that is specific to a given position so that it can be used during the join.
  
  == Changes ==
  Sept 23 2009, Gates
@@ -478, +480 @@

  Sept 25 2009, Gates
   * Added allFinished call to !StoreFunc
  
+ Sept 29 2009, Gates
+  * Added answer for open question 1.  Added and answered open questions 2 and 
3.
+ 

Reply via email to